给定一个序列对(查询,文本)作为Transformer模型(如BERTModel)的输入,我们如何选择文本中与查询相关的注意力得分最高的n-gram(unigram和bigram)?
query = "machine learning"
text = """
Supervised learning is the machine learning task of learning a function that
maps an input to an output based on example input-output pairs. It infers a
function from labeled training data consisting of a set of training examples.
In supervised learning, each example is a pair consisting of an input object
(typically a vector) and a desired output value (also called the supervisory signal).
A supervised learning algorithm analyzes the training data and produces an inferred function,
which can be used for mapping new examples.
"""
字符串
可能的答案:机器学习、监督学习、函数、标记训练。
为了快速开始,我们可以使用以下代码片段提取注意力分数:
from transformers import AutoTokenizer, BertModel
import torch
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert-base-uncased", output_attentions=True)
query = "machine learning"
text = """
Supervised learning is the machine learning task of learning a function that
maps an input to an output based on example input-output pairs. It infers a
function from labeled training data consisting of a set of training examples.
In supervised learning, each example is a pair consisting of an input object
(typically a vector) and a desired output value (also called the supervisory signal).
A supervised learning algorithm analyzes the training data and produces an inferred function,
which can be used for mapping new examples.
"""
# tokenized query and text as a sentece pair
input = tokenizer(text=query, text_pair=text, max_length=128,
padding="max_length", truncation=True, return_tensors="pt")
# {'input_ids': tensor([[ 101, 3698, 4083, 102, 13588, 4083, 2003, 1996, 3698, 4083,
# 4708, 1997, 4083, 1037, 3853, 2008, 7341, 2019, 7953, 2000,
# 2019, 6434, 2241, 2006, 2742, 7953, 1011, 6434, 7689, 1012,
# 2009, 1999, 24396, 1037, 3853, 2013, 12599, 2731, 2951, 5398,
# 1997, 1037, 2275, 1997, 2731, 4973, 1012, 1999, 13588, 4083,
# 1010, 2169, 2742, 2003, 1037, 3940, 5398, 1997, 2019, 7953,
# 4874, 1006, 4050, 1037, 9207, 1007, 1998, 1037, 9059, 6434,
# 3643, 1006, 2036, 2170, 1996, 26653, 4742, 1007, 1012, 1037,
# 13588, 4083, 9896, 17908, 2015, 1996, 2731, 2951, 1998, 7137,
# 2019, 1999, 7512, 5596, 3853, 1010, 2029, 2064, 2022, 2109,
# 2005, 12375, 2047, 4973, 1012, 102, 0, 0, 0, 0,
# 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
# 0, 0, 0, 0, 0, 0, 0, 0]]), 'token_type_ids': tensor([[0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
# 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
# 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
# 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
# 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
# 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
# 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
# 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
# 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
# 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
# 0, 0, 0, 0, 0, 0, 0, 0]])}
# Make predictions with the model
outputs = model(**input)
# Extract attention scores with shape (batch_size, num_heads, sequence_length, sequence_length)
attention_scores = outputs['attentions']
型
1条答案
按热度按时间xienkqul1#
你需要考虑一下
highest attention score concerning the query?
到底是什么意思,bert模型有12层,每层有12个注意力头,这意味着outputs['attentions']
包含12个大小为(bs, 12, sl, sl)
的注意力Tensor。你可以从这个开始
字符串
这可以被认为是一种形式的“最高注意力分数”。你会注意到结果的变化基于我们使用的注意力层。
如果使用最后一层,则前10个令牌为
'.... [SEP] ) ), learning is'
。如果使用第一层,则得到'supervised learning the machine [SEP] learning task learning. a'
你应该考虑注意力分数实际上在做什么(在这种情况下,帮助解决屏蔽令牌填充的问题),你试图解决什么问题,以及为什么你认为你的问题的答案可以在注意力权重中找到。