pandas 添加具有特定计数的分类列

ctehm74n 于 2022-11-27 发布在其他

关注(0)|答案(1)|浏览(103)

我尝试创建一个新的国家分类列，其中包含特定的百分比值。以下面的数据集为例：

df = sns.load_dataset("titanic")

我正在尝试使用以下脚本获取新列：

country = ['UK', 'Ireland', 'France']

df["country"] = np.random.choice(country, len(df))

df["country"].value_counts(normalize=True)

UK         0.344557
Ireland    0.328844
France     0.326599

但是，我得到的所有国家都有相同的计数。我想要每个国家的具体计数：

- 所需输出**

df["country"].value_counts(normalize=True)

UK         0.91
Ireland    0.06
France     0.03

什么是理想的方式来获得所需的输出？任何建议将不胜感激。谢谢!

pandas

来源：https://stackoverflow.com/questions/74533638/add-categorical-column-with-specific-count

1条答案

按热度按时间

dbf7pr2w1#

是否更改numpy.random.choice的概率？

df["country"] = np.random.choice(country, len(df), p=[0.91, 0.06, 0.03])
df["country"].value_counts(normalize=True)

输出量：

UK         0.902357
Ireland    0.058361
France     0.039282
Name: country, dtype: float64

如果需要精确的值数（在精度限制内）：

p = [0.91, 0.06, 0.03]
r = (np.array(p)*len(df)).round().astype(int) # the sum MUST be equal to len(df)
# or
# r = [811,  53,  27]

a = np.repeat(country, r)
np.random.shuffle(a)

df['country'] = a

df["country"].value_counts(normalize=True)

输出量：

UK         0.910213
Ireland    0.059484
France     0.030303
Name: country, dtype: float64

赞(0）回复(0）举报 2022-11-27

我来回答

pandas 添加具有特定计数的分类列

1条答案

相关问题

热门标签

最新问答