python 你能通过矢量化的方式合并另一个变量来选择一个变量吗?

owfi6suc  于 2023-06-20  发布在  Python
关注(0)|答案(2)|浏览(91)

问题

我有几个变量x,我想在一个变量binned_list中使用一些bin进行排序。
举个例子,x是一个随机向量,有两个分量,从0到10/sqrt(2),我想在binned_list列表中按x的模进行排序。我有三个箱的模数:[0,3.33),[3.33,6.66)和[6.66,10),我想把x的不同迭代保存到binned_list中,这是一个3个列表的列表,每个列表对应于该bin上x的模值。
我可以用以下方法来做:

import numpy as np
N_bins = 3
Bins = np.linspace(0, 10, N_bins+1)

binned_list = [[] for b in range(N_bins)]

N_elements = 5

np.random.seed(1)

for k in range(3):
    x = np.random.random((N_elements,2))/np.sqrt(2)*10
    mod_x = np.sqrt(x[:,0]**2 + x[:,1]**2)
    dig_x = np.digitize( mod_x, bins = Bins ) - 1

    for y in range(len(x)):
        binned_list[dig_x[y]].append(x[y])

输出:

[[array([8.08752089e-04, 2.13781412e+00]),
  array([1.03772086, 0.65293247]),
  array([1.31705859, 2.44348333]),
  array([0.99268556, 1.40078906]),
  array([0.60135339, 0.27615902])],
 [array([2.94879087, 5.09346334]),
  array([2.80556972, 3.81000966]),
  array([2.96415284, 4.84523355]),
  array([1.44569572, 6.20922794]),
  array([0.19365953, 4.74092123]),
  array([2.95079056, 3.95053366]),
  array([2.21624362, 4.89546016]),
  array([1.20088241, 6.20940519])],
 [array([5.66211915, 6.84664326]), array([6.19700713, 6.32582438])]]

问题

一旦我数字化了x的元素,我可以避免循环它们以将它们保存在变量binned_list中吗?我想用矢量化的方式来做这件事,以使代码更有效。
我想到了这样的东西:

binned_list[dig_x].append(x)

但是我不能用数组来分割列表。同样,如果我定义binned_list为数组,我也不能追加。

vatpfxk5

vatpfxk51#

您可以通过使用nonzero()的掩码来避免嵌套的for循环,而只循环遍历bin

x = np.random.random((N_elements,2))/np.sqrt(2)*10
mod_x = np.sqrt(np.sum(x**2,1))
dig_x = np.digitize(mod_x,bins=Bins)-1
for i in range(N_bins):
    binned_list[i] = x[(dig_x==i).nonzero()]
vohkndzv

vohkndzv2#

我比较了@mpw2的答案和我的答案,他的答案确实快了一点,对于我尝试的不同迭代次数和元素数量,大约快了1.5-5倍:

import numpy as np
import time

N_bins = 3
Bins = np.linspace(0, 10, N_bins+1)

binned_list1 = [[] for b in range(N_bins)]
binned_list2 = [[] for b in range(N_bins)]

N_elements = 10000

np.random.seed(1)

k_iter = 10

x = np.random.random((k_iter, N_elements,2))/np.sqrt(2)*10

start_time = time.time()
for k in range(k_iter):
    mod_x = np.sqrt(x[k,:,0]**2 + x[k,:,1]**2)
    dig_x = np.digitize( mod_x, bins = Bins ) - 1

    for y in range(len(x[k])):
        binned_list1[dig_x[y]].append(x[k,y])

print("1: %s s" % (time.time() - start_time))

start_time = time.time()
for k in range(k_iter):
    mod_x = np.sqrt(x[k,:,0]**2 + x[k,:,1]**2)
    dig_x = np.digitize( mod_x, bins = Bins ) - 1

    for i in range(N_bins):
        binned_list2[i].extend(x[k,(dig_x==i).nonzero()[0]])

print("2: %s s" % (time.time() - start_time))

for i in range(N_bins):
    print(np.allclose(np.array(binned_list1[i]), np.array(binned_list2[i])))

输出

1: 0.039897918701171875 s
2: 0.01396489143371582 s
True
True
True

相关问题