将pytorch完成添加到其原始种子的最快方法是什么?

6za6bjd0  于 2023-01-09  发布在  其他
关注(0)|答案(1)|浏览(129)

我从一个2dTensor开始,它由n个种子组成,被标记化,然后生成预测logit,并使用argsort对它们进行排序(),然后取前m个候选项。我正在寻找最佳方法,将这些预测中的每一个附加到生成它的种子上,并创建一个新的2DTensor,现在具有n * m个种子,其中每个种子现在长一个令牌。我是torch的新手,我想知道它是否有一个内置的矢量化方法来实现这个功能
这是我想让它做什么的伪代码

def predict(seeds, coherence_threshold, batch_size=16):
    """Takes in a tensor of tokenized seeds, outputs a tensor of tokenized seeds with completions"""

    dataloader = torch.utils.data.DataLoader(seeds, batch_size=batch_size, shuffle=False)

    new_seeds = torch.tensor([], dtype=int)
    with torch.no_grad():
        for batch in dataloader:
            batch_tensors = reference_gpt2(batch)
            batch_preds = batch_tensors.argsort(descending=True)
            batch_preds_pruned = batch_preds[:,-1,:coherence_threshold]
            # TODO come up with a more efficient way to do this
            for i in range(len(batch)):
                for j in range(len(batch_preds[i]))
                    new_seed = torch.concat((batch[i], batch_preds[i,j]))
                    new_seeds = torch.concat((new_seeds, [new_seed]))
                    
    return(new_seeds)
lrpiutwd

lrpiutwd1#

To append PyTorch completions onto their original seeds in the fastest way possible, you can use the torch.cat() function. This function concatenates tensors along a given dimension, and can be used to efficiently append one tensor to another.
例如:

import torch

# Assume that original_seed and completion are both tensors of shape (batch_size, sequence_length)
concatenated = torch.cat((original_seed, completion), dim=1)

相关问题