pytorch 基于BERT注意力分数选择文本术语

2lpgd968  于 2023-11-19  发布在  其他
关注(0)|答案(1)|浏览(101)

给定一个序列对(查询,文本)作为Transformer模型(如BERTModel)的输入,我们如何选择文本中与查询相关的注意力得分最高的n-gram(unigram和bigram)?

query = "machine learning"
text = """
         Supervised learning is the machine learning task of learning a function that
         maps an input to an output based on example input-output pairs. It infers a
         function from labeled training data consisting of a set of training examples.
         In supervised learning, each example is a pair consisting of an input object
         (typically a vector) and a desired output value (also called the supervisory signal).
         A supervised learning algorithm analyzes the training data and produces an inferred function,
         which can be used for mapping new examples.
      """

字符串

可能的答案:机器学习、监督学习、函数、标记训练。

为了快速开始,我们可以使用以下代码片段提取注意力分数:

from transformers import AutoTokenizer, BertModel
import torch
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert-base-uncased", output_attentions=True)

query = "machine learning"
text = """
         Supervised learning is the machine learning task of learning a function that
         maps an input to an output based on example input-output pairs. It infers a
         function from labeled training data consisting of a set of training examples.
         In supervised learning, each example is a pair consisting of an input object
         (typically a vector) and a desired output value (also called the supervisory signal).
         A supervised learning algorithm analyzes the training data and produces an inferred function,
         which can be used for mapping new examples.
      """
# tokenized query and text as a sentece pair
input = tokenizer(text=query, text_pair=text, max_length=128,
                                  padding="max_length", truncation=True, return_tensors="pt")
# {'input_ids': tensor([[  101,  3698,  4083,   102, 13588,  4083,  2003,  1996,  3698,  4083,
#           4708,  1997,  4083,  1037,  3853,  2008,  7341,  2019,  7953,  2000,
#           2019,  6434,  2241,  2006,  2742,  7953,  1011,  6434,  7689,  1012,
#           2009,  1999, 24396,  1037,  3853,  2013, 12599,  2731,  2951,  5398,
#           1997,  1037,  2275,  1997,  2731,  4973,  1012,  1999, 13588,  4083,
#           1010,  2169,  2742,  2003,  1037,  3940,  5398,  1997,  2019,  7953,
#           4874,  1006,  4050,  1037,  9207,  1007,  1998,  1037,  9059,  6434,
#           3643,  1006,  2036,  2170,  1996, 26653,  4742,  1007,  1012,  1037,
#          13588,  4083,  9896, 17908,  2015,  1996,  2731,  2951,  1998,  7137,
#           2019,  1999,  7512,  5596,  3853,  1010,  2029,  2064,  2022,  2109,
#           2005, 12375,  2047,  4973,  1012,   102,     0,     0,     0,     0,
#              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
#              0,     0,     0,     0,     0,     0,     0,     0]]), 'token_type_ids': tensor([[0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
#          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
#          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
#          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
#          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
#          0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
#          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
#          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
#          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
#          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
#          0, 0, 0, 0, 0, 0, 0, 0]])}

# Make predictions with the model
outputs = model(**input)

# Extract attention scores with shape (batch_size, num_heads, sequence_length, sequence_length)
attention_scores = outputs['attentions']

xienkqul

xienkqul1#

你需要考虑一下highest attention score concerning the query?到底是什么意思,bert模型有12层,每层有12个注意力头,这意味着outputs['attentions']包含12个大小为(bs, 12, sl, sl)的注意力Tensor。
你可以从这个开始

# mask tokens based on query/response, ignoring padding
query_mask = (input['attention_mask']==1) * (input['token_type_ids']==0)
text_mask = (input['attention_mask']==1) * (input['token_type_ids']==1)

# grab attention from final layer of the model
attention = outputs.attentions[-1]

# this subsets the attention matrix to size `(12, n_query_tokens, n_response_tokens)`
# this has to be a for loop because the values of `n_query_tokens` and `n_response_tokens` will be different for different batch items
bs = attention.shape[0]
extracted_attentions = []
for i in range(bs):
    q_mask = query_mask[i]
    t_mask = text_mask[i]

    masked_attention = attention[i, :, q_mask, :][:, :, t_mask]

    extracted_attentions.append(masked_attention)

# grab attention submatrix for first batch item, as an example
extracted_attention = extracted_attentions[0]

# average over number of heads and query tokens
mean_attentions = torch.mean(extracted_attention, (0,1))

# get topk mean attention values
k = 10
topk_attention_values, topk_idxs = mean_attentions.topk(k)

# get response tokens for first batch item
response_token_ids = input['input_ids'][0][text_mask[0]]

# grab topk tokens by attention score
topk_tokens = response_token_ids[topk_idxs]

# decode back to tokens
tokenizer.decode(topk_tokens)

字符串
这可以被认为是一种形式的“最高注意力分数”。你会注意到结果的变化基于我们使用的注意力层。
如果使用最后一层,则前10个令牌为'.... [SEP] ) ), learning is'。如果使用第一层,则得到'supervised learning the machine [SEP] learning task learning. a'
你应该考虑注意力分数实际上在做什么(在这种情况下,帮助解决屏蔽令牌填充的问题),你试图解决什么问题,以及为什么你认为你的问题的答案可以在注意力权重中找到。

相关问题