如何批量拆分numpy数组？

ki0zmccv 于 12个月前发布在其他

关注(0)|答案(5)|浏览(98)

这听起来很容易，我不知道怎么做。
我有numpy 2d数组

X = (1783,30)

字符串
我想把它们分成64个一批。我这样写代码。

batches = abs(len(X) / BATCH_SIZE ) + 1  // It gives 28

型
我尝试批量预测结果。所以我用零填充批次，并用预测结果覆盖它们。

predicted = []

for b in xrange(batches): 

 data4D = np.zeros([BATCH_SIZE,1,96,96]) #create 4D array, first value is batch_size, last number of inputs
 data4DL = np.zeros([BATCH_SIZE,1,1,1]) # need to create 4D array as output, first value is  batch_size, last number of outputs
 data4D[0:BATCH_SIZE,:] = X[b*BATCH_SIZE:b*BATCH_SIZE+BATCH_SIZE,:] # fill value of input xtrain

 #predict
 #print [(k, v[0].data.shape) for k, v in net.params.items()]
 net.set_input_arrays(data4D.astype(np.float32),data4DL.astype(np.float32))
 pred = net.forward()
 print 'batch ', b
 predicted.append(pred['ip1'])

print 'Total in Batches ', data4D.shape, batches
print 'Final Output: ', predicted

型
但是在最后一批28号中，只有55个元素而不是64个（总共1783个元素），并且它给出了
第一个月
这是怎么回事？
PS：网络预测需要准确的批量大小是64来预测。

numpy

来源：https://stackoverflow.com/questions/28507052/how-to-split-numpy-array-in-batches

5条答案

按热度按时间

rjzwgtxy1#

我也不太明白你的问题，尤其是X是什么样子的。如果你想创建数组大小相等的子组，试试这个：

def group_list(l, group_size):
    """
    :param l:           list
    :param group_size:  size of each group
    :return:            Yields successive group-sized lists from l.
    """
    for i in xrange(0, len(l), group_size):
        yield l[i:i+group_size]

字符串

赞(0）回复(0）举报 12个月前

u5i3ibmn2#

我发现了一个简单的方法来解决批量问题，通过生成虚拟对象，然后填充必要的数据。

data = np.zeros(batches*BATCH_SIZE,1,96,96)
// gives dummy  28*64,1,96,96

字符串
这段代码将准确地加载64批大小的数据。最后一批将在末尾有虚拟零，但这没关系：）

pred = []
for b in batches:
 data4D[0:BATCH_SIZE,:] = data[b*BATCH_SIZE:b*BATCH_SIZE+BATCH_SIZE,:]
 pred = net.predict(data4D)
 pred.append(pred)

output =  pred[:1783] // first 1783 slice

型
最后我从28*64的总数中切出1783个元素。这对我很有效，但我相信有很多方法。

赞(0）回复(0）举报 12个月前

ht4b089n3#

从Python 3.12开始可以使用itertools.batched函数。
对于较旧的Python版本，或者如果你正在处理numpy数组，你可以使用np.reshape来批处理数组。
假设你有batch_size=2，然后在整形时使用批量大小作为第二维。

>>> np.arange(10).reshape(-1, batch_size)
array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

字符串
第一个维度是“批量”，第二个维度是batch_size。您可以忽略它，它将提供给予连续的批次。
如果你有多维数组，比如：

>>> array_2d = np.arange(30).reshape(6,5)
>>> array_2d
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29]])

型
您可以再次使用第二维进行批处理：

>>> array_2d.reshape(3, batch_size, 5)
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9]],

       [[10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29]]])

>>> array_2d.reshape(3, batch_size, 5)[0]  # sequential items when iterating
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

型
注意，这需要第一个维度可以被batch_size整除，所以要么去掉余数（例如array_2d[:len(array_2d) // batch_size * batch_size]），要么用零填充（参见np.pad）。

赞(0）回复(0）举报 12个月前

i2byvkas4#

这可以使用numpy的as_strided来实现。

from numpy.lib.stride_tricks import as_strided
def batch_data(test, batch_size):
    m,n = test.shape
    S = test.itemsize
    if not batch_size:
        batch_size = m
    count_batches = m//batch_size
    # Batches which can be covered fully
    test_batches = as_strided(test, shape=(count_batches, batch_size, n), strides=(batch_size*n*S,n*S,S)).copy()
    covered = count_batches*batch_size
    if covered < m:
        rest = test[covered:,:]
        rm, rn = rest.shape
        mismatch = batch_size - rm
        last_batch = np.vstack((rest,np.zeros((mismatch,rn)))).reshape(1,-1,n)
        return np.vstack((test_batches,last_batch))
    return test_batches

字符串

赞(0）回复(0）举报 12个月前

vcirk6k65#

data4D[0:BATCH_SIZE,:]应该是data4D[b*BATCH_SIZE:b*BATCH_SIZE+BATCH_SIZE, :]。

赞(0）回复(0）举报 12个月前