我从一个2dTensor开始,它由n个种子组成,被标记化,然后生成预测logit,并使用argsort对它们进行排序(),然后取前m个候选项。我正在寻找最佳方法,将这些预测中的每一个附加到生成它的种子上,并创建一个新的2DTensor,现在具有n * m个种子,其中每个种子现在长一个令牌。我是torch的新手,我想知道它是否有一个内置的矢量化方法来实现这个功能
这是我想让它做什么的伪代码
def predict(seeds, coherence_threshold, batch_size=16):
"""Takes in a tensor of tokenized seeds, outputs a tensor of tokenized seeds with completions"""
dataloader = torch.utils.data.DataLoader(seeds, batch_size=batch_size, shuffle=False)
new_seeds = torch.tensor([], dtype=int)
with torch.no_grad():
for batch in dataloader:
batch_tensors = reference_gpt2(batch)
batch_preds = batch_tensors.argsort(descending=True)
batch_preds_pruned = batch_preds[:,-1,:coherence_threshold]
# TODO come up with a more efficient way to do this
for i in range(len(batch)):
for j in range(len(batch_preds[i]))
new_seed = torch.concat((batch[i], batch_preds[i,j]))
new_seeds = torch.concat((new_seeds, [new_seed]))
return(new_seeds)
1条答案
按热度按时间lrpiutwd1#
To append PyTorch completions onto their original seeds in the fastest way possible, you can use the torch.cat() function. This function concatenates tensors along a given dimension, and can be used to efficiently append one tensor to another.
例如: