如何在pytorch中填充超过一维的可变长度序列?

ulydmbyx  于 2022-12-13  发布在  其他
关注(0)|答案(5)|浏览(544)

在pytorch中有没有干净的方法来创建一批3D序列?(sequence_length_lvl1,sequence_length_lvl2,D),这些序列的sequence_length_lvl1和sequence_length_lvl2的值不同,但它们的D值都相同,我想在第一维和第二维中填充这些序列,并创建一批序列,但是我不能使用pytorch pad_sequence函数,因为它只在序列只有一维长度可变的情况下才有效。我想问一下,是否有人知道什么简单的方法来实现这一点?
为了更清楚起见,我提供了一个例子。假设输入序列如下:

input1 = [
[[1, 1, 1], [2, 2, 2], [3, 3, 3]], 
[[4, 4, 4], [5, 5, 5]]
]

input2 = [
[[1, 1, 1], [2, 2, 2], [3, 3, 3]], 
[[6, 6, 6]],
[[4, 4, 4], [5, 5, 5]]
]

我想填充[input1,input2]。所需的输出将是:

output = [
[[[1, 1, 1], [2, 2, 2], [3, 3, 3]], 
[[4, 4, 4], [5, 5, 5], [0, 0, 0]],
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]],

[[[1, 1, 1], [2, 2, 2], [3, 3, 3]], 
[[6, 6, 6], [0, 0, 0], [0, 0, 0]],
[[4, 4, 4], [5, 5, 5], [0, 0, 0]]]
]

因此,期望输出具有(2,3,3,3)的形状。

flvtvl50

flvtvl501#

这对你的例子有效,也许有更快的方法。

input1 = [
    [[1, 1, 1], [2, 2, 2], [3, 3, 3]],
    [[4, 4, 4], [5, 5, 5]]
    ]

input2 = [
    [[1, 1, 1], [2, 2, 2], [3, 3, 3]],
    [[6, 6, 6]],
    [[4, 4, 4], [5, 5, 5]]
    ]

len_max = max(len(input1), len(input2))
output_val = [[], []]
no_val = [[0, 0, 0], [0, 0, 0], [0, 0, 0]]

for i in range(len_max):
    try:
        a = []
        a = input1[i]
    except Exception:
        a = no_val

    add_empty = 3 - len(a)
    for j in range(add_empty):
        a += [[0, 0, 0]]

    try:
        b = []
        b = input2[i]
    except Exception:
        b = no_val

    add_empty = 3 - len(b)
    for j in range(add_empty):
        b += [[0, 0, 0]]

    output_val[0] += [a]
    output_val[1] += [b]

print('-------------\n', output_val)
z2acfund

z2acfund2#

您可以使用text2array库,该库可以执行此类填充 *,无论序列嵌套有多深 *(免责声明:我是作者)。使用pip install text2array安装,然后:

from text2array import Batch

arr = Batch([{'x': input1}, {'x': input2}]).to_array()
print(arr['x'])

将打印

array([[[[1, 1, 1],
         [2, 2, 2],
         [3, 3, 3]],

        [[4, 4, 4],
         [5, 5, 5],
         [0, 0, 0]],

        [[0, 0, 0],
         [0, 0, 0],
         [0, 0, 0]]],

       [[[1, 1, 1],
         [2, 2, 2],
         [3, 3, 3]],

        [[6, 6, 6],
         [0, 0, 0],
         [0, 0, 0]],

        [[4, 4, 4],
         [5, 5, 5],
         [0, 0, 0]]]])

输出是一个NumPy数组,但您可以很容易地将其转换为带有torch.from_numpy的PyTorchTensor。

cclgggtu

cclgggtu3#

我不确定pytorch数据结构,但如果它们是类似list的数据,您可以使用我的解决方案。
此功能用于填充每个维度中的缺失值(即宽度、高度和深度)与0进行比较,将尺寸调整为与最大尺寸相同。* 这可以应用于任意数量的输入,而不仅仅是2*。首先,找到所有输入的最大宽度、最大高度和最大深度(例如,input1input2)。之后,对于每个输入,用0填充缺失的单元格,然后将它们连接在一起。
此方法不需要任何其他库。

def fill_missing_dimension(inputs):
    output = []

    # find max width, height, depth among all inputs
    max_width = max([len(i) for i in inputs])
    max_height = max([len(j) for i in inputs for j in i])
    max_depth = max([len(k) for i in inputs for j in i for k in j])

    print(max_width, max_height, max_depth)

    # fill missing dimension with 0 for all inputs
    for input in inputs:
        for i in range(len(input)):
            for j in range(len(input[i])):
                for k in range(len(input[i][j]), max_depth):
                    input[i][j].append(0)
            for j in range(len(input[i]), max_height):
                input[i].append([0] * max_depth)
        for i in range(len(input), max_width):
            input.append([[0] * max_depth] * max_height)

        # concate all inputs
        output.append(input)

    return output

如果你认为上面的代码太长,下面是上面函数的更短更清晰(列表理解)的版本(但是很难阅读和理解):

# comprehension version of fill_missing_dimension
def fill_missing_dimension(inputs):
    max_width = max([len(i) for i in inputs])
    max_height = max([len(j) for i in inputs for j in i])
    max_depth = max([len(k) for i in inputs for j in i for k in j])
    return [[[[[input[i][j][k] if k < len(input[i][j]) else 0 for k in range(max_depth)] if j < len(input[i]) else [0] * max_depth for j in range(max_height)] if i < len(input) else [[0] * max_depth] * max_height for i in range(max_width)] for input in inputs]]

示例

input1 = [
[[1, 1, 1], [2, 2, 2], [3, 3, 3]], 
[[4, 4, 4], [5, 5, 5]]
]

input2 = [
[[1, 1, 1], [2, 2, 2], [3, 3, 3]], 
[[6, 6, 6]],
[[4, 4, 4], [5, 5, 5]]
]

output = fill_missing_dimension([input1, input2])

输出:

> output

[[[[1, 1, 1], [2, 2, 2], [3, 3, 3]],
  [[4, 4, 4], [5, 5, 5], [0, 0, 0]],
  [[0, 0, 0], [0, 0, 0], [0, 0, 0]]],
 [[[1, 1, 1], [2, 2, 2], [3, 3, 3]],
  [[6, 6, 6], [0, 0, 0], [0, 0, 0]],
  [[4, 4, 4], [5, 5, 5], [0, 0, 0]]]]

如果要将输出用作numpy array,可以使用np.array(),如下所示:

import numpy as np
# convert to numpy array
output = np.array(output)
print(output.shape) # (2, 3, 3, 3)
vcudknz3

vcudknz34#

您仍然可以使用pad_sequence执行此操作,但必须包含一个初始for循环,以使倒数第二个维度均匀。

import torch
from torch.nn.utils.rnn import pad_sequence

sequences = [
    [
        torch.Tensor([[1, 1, 1], [2, 2, 2], [3, 3, 3]]),
        torch.Tensor([[4, 4, 4], [5, 5, 5]])
    ],
    [
        torch.Tensor([[1, 1, 1], [2, 2, 2], [3, 3, 3]]),
        torch.Tensor([[6, 6, 6]]),
        torch.Tensor([[4, 4, 4], [5, 5, 5]])
    ]
]

padded_sequences = []
# Loop through the sequences for the initial padding
for sequence in sequences:
    padded_sequences.append(pad_sequence(sequence,
                                         batch_first=True,
                                         padding_value=0))

# The shapes for the tensors in padded_sequences are now:
# (2, 3, 3)
# (3, 3, 3)

padded_sequences = pad_sequence(padded_sequences,
                                batch_first=True,
                                padding_value=0)
print(padded_sequences.shape)
print(padded_sequences)

这只需要使用一个for循环就可以创建所需的Tensor。
输出量:

torch.Size([2, 3, 3, 3])
tensor([[[[1., 1., 1.],
          [2., 2., 2.],
          [3., 3., 3.]],

         [[4., 4., 4.],
          [5., 5., 5.],
          [0., 0., 0.]],

         [[0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.]]],

        [[[1., 1., 1.],
          [2., 2., 2.],
          [3., 3., 3.]],

         [[6., 6., 6.],
          [0., 0., 0.],
          [0., 0., 0.]],

         [[4., 4., 4.],
          [5., 5., 5.],
          [0., 0., 0.]]]])
rpppsulh

rpppsulh5#

要在PyTorch中创建一批3D序列,可以使用torch.nn.utils.rnn.pad_sequence函数。该函数允许您指定填充值,在您的情况下,填充值应该是一个与序列形状相同的零Tensor(即(sequence_length_lvl1, sequence_length_lvl2, D))。

示例:

import torch
from torch.nn.utils.rnn import pad_sequence

# Define your sequences
seq1 = [[[1, 1, 1], [2, 2, 2], [3, 3, 3]], [[4, 4, 4], [5, 5, 5]]]
seq2 = [[[1, 1, 1], [2, 2, 2], [3, 3, 3]], [[6, 6, 6]], [[4, 4, 4], [5, 5, 5]]]

# Convert the sequences to tensors
seq1 = torch.Tensor(seq1)
seq2 = torch.Tensor(seq2)

# Define the padding value as a tensor of zeros with the same shape as your sequences
padding_value = torch.zeros((seq1.shape[0], seq1.shape[1], seq1.shape[2]))

# Use pad_sequence to create a batch of the padded sequences
batch = pad_sequence([seq1, seq2], padding_value=padding_value)

# The shape of the batch tensor should be (2, 3, 3, 3)
print(batch.shape)

这应该会产生所需的输出shape(2,3,3,3)。请注意,pad_sequence会自动将第一个和第二个维度中的序列填补至批次中所有序列中每个维度的最大长度。在此情况下,第一个维度会填补至长度3,第二个维度会填补至长度3。

输出:

tensor([[[[1., 1., 1.],
          [2., 2., 2.],
          [3., 3., 3.]],

         [[4., 4., 4.],
          [5., 5., 5.],
          [0., 0., 0.]],

         [[0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.]]],

        [[[1., 1., 1.],
          [2., 2., 2.],
          [3., 3., 3.]],

         [[6., 6., 6.],
          [0., 0., 0.],
          [0., 0., 0.]],

         [[4., 4., 4.],
          [5., 5., 5.],
          [0., 0., 0.]]]])

相关问题