numpy 基于权重的列表中可变长度元素采样

xqnpmsa8 于 2023-01-05 发布在其他

关注(0)|答案(1)|浏览(139)

我想从一个列表中根据每个元素的给定权重选择一些项目。输出的长度是未知的。这需要做很多次。
假设我有一个类似[[1, 0.2], [2, 0.3], [3, 0.45], [4, 0.05], [5,1.]]的[id, probability]列表，我想得到类似[[1,3,5], [5], [3,5], [5], [1,2,4,5], ...]的内容
下面是我的代码，它可以工作，但是非常慢（列表很长，超过10，000个[id, probability]元素，而我的结果是数千个selection元素），你知道有什么方法可以让它更快吗？

import numpy as np

items  = [[1, 0.2], [2, 0.3], [3, 0.45], [4, 0.05], [5,1.]]

combinations = []
for n in range(1000):
    selection = []
    for i in items:
        chosen = np.random.choice([True, False], p=[i[1], 1.-i[1]])
        if chosen:
            selection.append(i[0])
    combinations.append(selection)

numpy

来源：https://stackoverflow.com/questions/60382978/variable-length-element-sampling-from-list-based-on-weights

1条答案

按热度按时间

8yparm6h1#

您可以按如下方式矢量化采样步骤：

import numpy as np

# items  = [[1, 0.2], [2, 0.3], [3, 0.45], [4, 0.05], [5,1.]]
items = [(i, np.random.rand()) for i in range(1000)]

def sample_original(itms, n=1000):
  combinations = []
  for n in range(n):
      selection = []
      for i in items:
          chosen = np.random.choice([True, False], p=[i[1], 1.-i[1]])
          if chosen:
              selection.append(i[0])
      combinations.append(selection)

def sample_numpy(itms, n=1000):
  elts, probs = np.array(itms).T
  m = len(elts)
  return [elts[np.random.rand(m) < probs] for _ in range(n)]

主要的观察结果是np.random.rand(m) < probs给出了一个随机向量True/False，它以正确的概率选择原始列表中的元素，当有1000个元素时，速度似乎快了1000倍：

%timeit sample_numpy(items)
%timeit sample_original(items)
10 loops, best of 3: 22.6 ms per loop
1 loop, best of 3: 21.9 s per loop

如果概率相对较高（生成的选择不是稀疏的），您可能希望将选择指示符存储在2D数组中，以进一步提高性能。

赞(0）回复(0）举报 2023-01-05

我来回答

numpy 基于权重的列表中可变长度元素采样

1条答案

相关问题

热门标签

最新问答