pytorch 基于BERT注意力分数选择文本术语

给定一个序列对（查询，文本）作为Transformer模型（如BERTModel）的输入，我们如何选择文本中与查询相关的注意力得分最高的n-gram（unigram和bigram）？

query = "machine learning"
text = """
         Supervised learning is the machine learning task of learning a function that
         maps an input to an output based on example input-output pairs. It infers a
         function from labeled training data consisting of a set of training examples.
         In supervised learning, each example is a pair consisting of an input object
         (typically a vector) and a desired output value (also called the supervisory signal).
         A supervised learning algorithm analyzes the training data and produces an inferred function,
         which can be used for mapping new examples.
      """

字符串

可能的答案：机器学习、监督学习、函数、标记训练。

为了快速开始，我们可以使用以下代码片段提取注意力分数：

from transformers import AutoTokenizer, BertModel
import torch
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert-base-uncased", output_attentions=True)

query = "machine learning"
text = """
         Supervised learning is the machine learning task of learning a function that
         maps an input to an output based on example input-output pairs. It infers a
         function from labeled training data consisting of a set of training examples.
         In supervised learning, each example is a pair consisting of an input object
         (typically a vector) and a desired output value (also called the supervisory signal).
         A supervised learning algorithm analyzes the training data and produces an inferred function,
         which can be used for mapping new examples.
      """
# tokenized query and text as a sentece pair
input = tokenizer(text=query, text_pair=text, max_length=128,
                                  padding="max_length", truncation=True, return_tensors="pt")
# {'input_ids': tensor([[  101,  3698,  4083,   102, 13588,  4083,  2003,  1996,  3698,  4083,
#           4708,  1997,  4083,  1037,  3853,  2008,  7341,  2019,  7953,  2000,
#           2019,  6434,  2241,  2006,  2742,  7953,  1011,  6434,  7689,  1012,
#           2009,  1999, 24396,  1037,  3853,  2013, 12599,  2731,  2951,  5398,
#           1997,  1037,  2275,  1997,  2731,  4973,  1012,  1999, 13588,  4083,
#           1010,  2169,  2742,  2003,  1037,  3940,  5398,  1997,  2019,  7953,
#           4874,  1006,  4050,  1037,  9207,  1007,  1998,  1037,  9059,  6434,
#           3643,  1006,  2036,  2170,  1996, 26653,  4742,  1007,  1012,  1037,
#          13588,  4083,  9896, 17908,  2015,  1996,  2731,  2951,  1998,  7137,
#           2019,  1999,  7512,  5596,  3853,  1010,  2029,  2064,  2022,  2109,
#           2005, 12375,  2047,  4973,  1012,   102,     0,     0,     0,     0,
#              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
#              0,     0,     0,     0,     0,     0,     0,     0]]), 'token_type_ids': tensor([[0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
#          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
#          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
#          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
#          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
#          0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
#          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
#          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
#          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
#          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
#          0, 0, 0, 0, 0, 0, 0, 0]])}

# Make predictions with the model
outputs = model(**input)

# Extract attention scores with shape (batch_size, num_heads, sequence_length, sequence_length)
attention_scores = outputs['attentions']

型

你需要考虑一下highest attention score concerning the query?到底是什么意思，bert模型有12层，每层有12个注意力头，这意味着outputs['attentions']包含12个大小为(bs, 12, sl, sl)的注意力Tensor。
你可以从这个开始

# mask tokens based on query/response, ignoring padding
query_mask = (input['attention_mask']==1) * (input['token_type_ids']==0)
text_mask = (input['attention_mask']==1) * (input['token_type_ids']==1)

# grab attention from final layer of the model
attention = outputs.attentions[-1]

# this subsets the attention matrix to size `(12, n_query_tokens, n_response_tokens)`
# this has to be a for loop because the values of `n_query_tokens` and `n_response_tokens` will be different for different batch items
bs = attention.shape[0]
extracted_attentions = []
for i in range(bs):
    q_mask = query_mask[i]
    t_mask = text_mask[i]

    masked_attention = attention[i, :, q_mask, :][:, :, t_mask]

    extracted_attentions.append(masked_attention)

# grab attention submatrix for first batch item, as an example
extracted_attention = extracted_attentions[0]

# average over number of heads and query tokens
mean_attentions = torch.mean(extracted_attention, (0,1))

# get topk mean attention values
k = 10
topk_attention_values, topk_idxs = mean_attentions.topk(k)

# get response tokens for first batch item
response_token_ids = input['input_ids'][0][text_mask[0]]

# grab topk tokens by attention score
topk_tokens = response_token_ids[topk_idxs]

# decode back to tokens
tokenizer.decode(topk_tokens)

字符串
这可以被认为是一种形式的“最高注意力分数”。你会注意到结果的变化基于我们使用的注意力层。
如果使用最后一层，则前10个令牌为'.... [SEP] ) ), learning is'。如果使用第一层，则得到'supervised learning the machine [SEP] learning task learning. a'
你应该考虑注意力分数实际上在做什么（在这种情况下，帮助解决屏蔽令牌填充的问题），你试图解决什么问题，以及为什么你认为你的问题的答案可以在注意力权重中找到。

pytorch 基于BERT注意力分数选择文本术语

1条答案

相关问题

热门标签

最新问答