numpy 有没有一种矢量化的方法,可以用np对不同的p进行多次采样?

yqyhoc1h  于 2022-11-10  发布在  其他
关注(0)|答案(4)|浏览(117)

我正在尝试实现一个变化率,我需要来自数组CT个样本,但每个样本具有不同的权重p_t
我用的是这个:

import numpy as np
from scipy import stats

batch_size = 1
T = 3
C = np.array(['A', 'B', 'C'])

# p_batch_T dimensions: (batch, sample, class)

p_batch_T = np.array([[[0.01, 0.98, 0.01],
                       [0.3,  0.15, 0.55],
                       [0.85, 0.1,  0.05]]])

def variation_ratio(C, p_T):
  # This function works only with one sample from the batch.
  Y_T = np.array([np.random.choice(C, size=1, p=p_t) for p_t in p_T]) # vectorize this
  C_mode, frecuency =  stats.mode(Y_T)
  T = len(Y_T)
  return 1.0 - (f/T)

def variation_ratio_batch(C, p_batch_T):
  return np.array([variation_ratio(C, p_T) for p_T in p_batch_T]) # and vectorize this

有没有办法用for来实现这些函数?

r7knjye2

r7knjye21#

我们可以在[0,1]之间均匀抽样,并将其与累积分布进行比较,而不是使用给定分布p_T进行抽样:
让我们从Y_T开始,比如说p_T = p_batch_T[0]

cum_dist = p_batch_T.cumsum(axis=-1)

idx_T = (np.random.rand(len(C),1) < cum_dist[0]).argmax(-1)
Y_T = C[idx_T[...,None]]
_, f = stats.mode(Y_T) # here axis=0 is default

现在让我们来看看variation_ratio_batch

idx_T = (np.random.rand(len(p_batch_T), len(C),1) < cum_dist).argmax(-1)

Y = C[idx_T[...,None]]

f = stats.mode(Y, axis=1)   # notice axis 0 is batch

out = 1 - (f/T)
ozxc1zmp

ozxc1zmp2#

你可以这样做:
首先,创建形状为(T, len(C))的二维权重数组,并取累计和:

n_rows = 5
n_cols = 3

weights = np.random.rand(n_rows, n_cols) 
cum_weights = (weights / weights.sum(axis=1, keepdims=True)).cumsum(axis=1)

cum_weights可能如下所示:

array([[0.09048919, 0.58962127, 1.        ],
       [0.36333997, 0.58380885, 1.        ],
       [0.28761923, 0.63413879, 1.        ],
       [0.39446498, 0.98760834, 1.        ],
       [0.27862476, 0.79715149, 1.        ]])

接下来,我们可以将cum_weights与适当大小的np.random.rand的输出进行比较。通过取argmin,我们在生成的随机数大于累积权重的每一行中找到索引:

indices = (cum_weights < np.random.rand(n_rows, 1)).argmin(axis=1)

然后,我们可以使用indices来索引形状为(n_cols,)的值的数组,在您的原始示例中是len(C)

o8x7eapl

o8x7eapl3#

np.vectorize应该可以工作:

from functools import partial
import numpy as np

@partial(np.vectorize, excluded=['rng'], signature='(),(k)->()')
def choice_batched(rng, probs):
  return rng.choice(a=probs.shape[-1], p=probs)

然后

num_classes = 3
batch_size = 5
alpha = .5  # Dirichlet prior hyperparameter.

rng = np.random.default_rng()

probs = np.random.dirichlet(alpha=np.full(fill_value=alpha, shape=num_classes), size=batch_size)

# Check each row sums to 1.

assert np.allclose(probs.sum(axis=-1), 1)

print(choice_batched(rng, probs))
print(choice_batched(rng, probs))
print(choice_batched(rng, probs))
print(choice_batched(rng, probs))

赠送

[2 0 0 0 1]
[1 0 0 0 1]
[2 0 2 0 1]
[1 0 0 0 0]
yeotifhr

yeotifhr4#

以下是我对Quang‘s和Gmds解决方案的实施:

def sample(ws, k):
    """Weighted sample k elements along the last axis.
    ws -- Tensor of probabilities, shape (*, n)
    k  -- Number of elements to sample.
    Returns tensor of shape (*, k) with values in {0, ..., n-1}.
    """
    assert np.allclose(ws.sum(-1), 1)
    cs = ws.cumsum(-1)
    ps = np.random.random(ws.shape[:-1] + (k,))
    return (cs[..., None, :] < ps[..., None]).sum(-1)

假设我们有一些东西

>>> stuff = array([[0, 1, 2],
                   [3, 4, 5],
                   [6, 7, 8]])

和一些权重/抽样概率。

>>> ws = array([[0.41296038, 0.36070229, 0.22633733],
                [0.37576672, 0.14518771, 0.47904557],
                [0.14742326, 0.29182459, 0.56075215]])

我们想沿着每一行抽样2个元素。那我们就这么做

>>> ids = sample(ws, 2)
[[2, 0],
 [1, 2],
 [2, 2]]

我们可以使用np.take_along_axisstuff中检索采样值:

>>> np.take_along_axis(stuff, ids)
[[2, 0],
 [4, 5],
 [8, 8]]

代码可以概括为沿着不同于上一个轴的轴进行采样,但我对广播感到困惑,所以应该有人尝试一下!

相关问题