Numpy `random.choice`但每次采样时使用不同的权重

yyyllmsg 于 2023-10-19 发布在其他

关注(0)|答案(1)|浏览(72)

我有一个(N, T+1)的权重数组。它的行是规范化的，这意味着

np.array_equal(W.sum(axis=1), np.ones(N))

返回True。现在我想从np.arange(T+1)中获取N样本，其中选择i第x个样本，我使用W的i第x行。我当然可以用for循环来实现：

import numpy as np

# Settings
N = 100
T = 20

# Create some normalized weights
W = np.random.rand(N, T+1)
W = W / W.sum(axis=1).reshape(-1, 1)

# Use a for loop to sample
samples = np.zeros(N)
for i in range(N):
    samples[i] = np.random.choice(a=np.arange(T+1), size=1, p=W[i, :])

然而，我想知道是否有一种方法可以在numpy/scipy中实现这一点，或者使用其他库。我希望有这样的东西：

# or perhaps a=np.repeat(np.arange(T+1).reshape(-1,1), N, axis=1).T
samples = some_function(a=np.arange(T+1), size=N, p=W)

numpy

来源：https://stackoverflow.com/questions/77045475/numpy-random-choice-but-using-different-weights-each-time-we-sample

1条答案

按热度按时间

xmakbtuz1#

我不确定numpy中是否有明确的内置支持，但你可以从均匀分布中随机抽样，并使用以下方法将它们转换为分类分布中的样本：

F = W.cumsum(axis=1)  # get the CDF of all distributions
r = np.random.rand(N)  # generate N random samples from U(0,1)
samples = np.argmax(F >= r[:, None], axis=1)  # convert all these samples to the categorical samples from respective distributions

上述所有操作都是矢量化的，samples[i]对应于W[i]的样本。我将尝试解释为什么以及如何工作。
假设你有一个分类分布p(i)。要从中抽样，您可以使用均匀分布U(0,1)和CDF F(i)的随机样本，使得F[i] - F[i-1] = p[i]：

# Assume p = categorical probability distribution i.e., p[i] = probability of i
F = p.cumsum()  # get the CDF
r = np.random.rand()  # a random sample from U(0,1)
s = np.argmax(F >= r)  # first index where F > r = a random sample from p - the categorical distribution

这是因为s使得F[s-1] < r <= F[s]（因为s是F > r的第一个索引）。因此，P(this event) = P(F[s-1] < r <= F[s]) = (F[s] - F[s-1])/1 = p[s]。因此，在这个范围内得到r的概率与得到s的概率相同。
实际上，使用矢量化形式，现在可以将代码泛化为每个i生成K样本：

K = 10000
r = np.random.rand(N, K)
samples = np.argmax(F[:, :, None] >= r[:, None, :], axis=1)

这里samples[i]对应于来自W[i]的K=10000样本。您还可以验证样本是否遵循我们开始使用的分类分布：

for i in range(T+1):
    w = np.bincount(samples[i])/K  # get empirical distribution from samples
    error = ((W[i] - w)**2).sum()  # get the error in empirical and theoretical distributions
    print(error)

这将给予经验分布和我们开始时的分布的误差（随着样本数量的增加，误差会变小-K，这是我们期望发生的）。
如果有帮助就告诉我。不知道解释有多清楚！

赞(0）回复(0）举报 2023-10-19

我来回答

Numpy `random.choice`但每次采样时使用不同的权重

1条答案

相关问题

热门标签

最新问答