pandas 元素按百分比频率分布

o8x7eapl 于 2022-12-16 发布在其他

关注(0)|答案(3)|浏览(125)

panda，numpy或者python中有没有函数可以根据百分比值生成频率分布，就像我们在java中使用EnumeratedDistribution一样。
输入：

values = [0, 1, 2]

percentage = [0.5, 0.30, 0.20]

total = 10

输出：

[0, 0, 0, 0, 0, 1, 1, 1, 2, 2]

在总共10种元素中，50%由0组成，30%由1组成，20%由2组成

pandas

来源：https://stackoverflow.com/questions/59625782/distribution-of-elements-according-to-percentage-frequency

3条答案

按热度按时间

5gfr0r5j1#

你可以使用numpy的repeat()函数来重复values中的值指定的次数（百分比 * 总数）：

import numpy as np

values = [0, 1, 2]

percentage = [0.5, 0.30, 0.20]

total = 11

repeats = np.around(np.array(percentage) * total).astype(np.int8)  # [6, 3, 2]

np.repeat(values, repeats)

输出：

array([0, 0, 0, 0, 0, 0, 1, 1, 1, 2, 2])

我使用np.around()函数舍入重复，以防它们不是整数（例如，如果总数为11，则11*0.5 -> 6、11*0.3 -> 3和11*0.2 -> 2）。

赞(0）回复(0）举报 2022-12-16

hjqgdpho2#

不使用numpy，而只使用列表理解：

values = [0, 1, 2]
percentage = [0.5, 0.30, 0.20]
total = 10

output = sum([[e]*int(total*p) for e,p in zip(values, percentage)], [])

赞(0）回复(0）举报 2022-12-16

kx7yvsdv3#

@Andreas K的解决方案很好，但仍然存在问题，即结果大小不总是等于原始总数。例如，g [27.3，36.4，27.3] = 91在四舍五入后将为[27，36，27] = 90
我更喜欢这种更好的圆的方式，通过编辑一点从这个职位https://stackoverflow.com/a/74044227/3789481

def round_retain_sum(x: np.array):
    x = x
    N = np.round(np.sum(x)).astype(int)
    y = x.astype(int)
    M = np.sum(y)
    K = N - M 
    z = y-x 
    if K!=0:
        idx = np.argpartition(z,K)[:K]
        y[idx] += 1     
    return y

import numpy as np

values = [0, 1, 2]
percentage = [0.5, 0.30, 0.20]
total = 11
repeats = round_retain_sum(np.array(percentage) * total)
np.repeat(values, repeats)

赞(0）回复(0）举报 2022-12-16

我来回答

pandas 元素按百分比频率分布

3条答案

相关问题

热门标签

最新问答