pandas 如何为每个像元分配随机正态值

ogsagwnx  于 2023-06-28  发布在  其他
关注(0)|答案(1)|浏览(86)

我有一个基于5分李克特量表的数据集。我想将每个单元格转换为具有预定均值和标准差的正态分布值。我现在的代码如下。

import random
Mu={1:0.021,2:0.146,3:0.375,4:0.625,5:0.979}
std={1:0.021,2:0.104,3:0.125,4:0.125,5:0.021}

#defining the random dictionary 
rnd={1:random.normalvariate(Mu[1], std[1]),
    2:random.normalvariate(Mu[2], std[2]),
    3:random.normalvariate(Mu[3], std[3]),
    4:random.normalvariate(Mu[4], std[4]),
    5:random.normalvariate(Mu[5], std[5])}

raw_data_rnd=raw_data.copy()

for col in raw_data_rnd.columns:
    raw_data_rnd[col].mask(raw_data_rnd[col]==1,random.normalvariate(Mu[1],std[1]),inplace=True)
    raw_data_rnd[col].mask(raw_data_rnd[col]==2,random.normalvariate(Mu[2],std[2]),inplace=True)
    raw_data_rnd[col].mask(raw_data_rnd[col]==3,random.normalvariate(Mu[3],std[3]),inplace=True)
    raw_data_rnd[col].mask(raw_data_rnd[col]==4,random.normalvariate(Mu[4],std[4]),inplace=True)
    raw_data_rnd[col].mask(raw_data_rnd[col]==5,random.normalvariate(Mu[5],std[5]),inplace=True)
raw_data_rnd

密码起作用了。但是,它为条件为真的每个单元格提供相同的值。我需要的是代码以某种方式循环随机值的分配,并在每个单元格中放入不同的值。换句话说,例如每次 Dataframe 的值为1时,我需要代码分配一个新的随机值。
有人能帮忙吗?
我试过几种方法。然而,我仍然在走死胡同。

kr98yfug

kr98yfug1#

IIUC,你可以使用np.unique来获取每个值的计数(1到5之间),然后使用np.where根据mustd参数用np.random.normal替换随机数:

import pandas as pd
import numpy as np

# For reproducible result
rng = np.random.default_rng(42)

# Sample dataframe: 4 columns, 20 rows, values between 1 and 9
df = pd.DataFrame(rng.integers(1, 10, (20, 4)), columns=list('ABCD'))

# DON'T KEEP THIS STEP
df = df.astype(float)  # important to cast dataframe as float
arr = df.values  # get a "view" of the dataframe

# Get number of occurrence for each number between 1 and 5
dmap = dict(zip(*np.unique(arr, return_counts=True)))

# Replace values
for i in range(1, 6):
    rnd = rng.normal(Mu[i], std[i], dmap[i])
    arr[np.where(arr == i)] = rnd

输出:

# After processing
>>> df
           A         B         C         D
0   0.036608  7.000000  6.000000  0.583139
1   0.645344  8.000000  0.032406  7.000000
2   0.216607  0.007024  0.955201  9.000000
3   7.000000  7.000000  7.000000  8.000000
4   0.959692  0.153028  8.000000  0.989440
5   0.981991  0.698278  0.176068  9.000000
6   8.000000  6.000000  0.713903  8.000000
7   0.993500  0.724168  0.970028  0.340607
8   0.025875  0.982329  8.000000  0.023450
9   8.000000  8.000000  0.561868  6.000000
10  0.211654  7.000000  7.000000  0.581409
11  0.025592  9.000000  0.992137  9.000000
12  7.000000  8.000000  7.000000 -0.005544
13  0.567206  0.972504  0.988592  0.039300
14  0.965100  0.112754  7.000000  7.000000
15  9.000000  7.000000  0.732247  9.000000
16  0.601087  0.266771  9.000000  0.465539
17  0.025696  0.971376  8.000000  0.097081
18  0.970984  0.079557  7.000000  0.953887
19  0.496035  0.164641  6.000000  7.000000

>>> dmap
{1.0: 8, 2.0: 8, 3.0: 5, 4.0: 10, 5.0: 14, 6.0: 4, 7.0: 14, 8.0: 10, 9.0: 7}

输入数据:

# Before processing
>>> df
    A  B  C  D
0   1  7  6  4
1   4  8  1  7
2   2  1  5  9
3   7  7  7  8
4   5  2  8  5
5   5  4  2  9
6   8  6  4  8
7   5  4  5  3
8   1  5  8  1
9   8  8  3  6
10  2  7  7  4
11  1  9  5  9
12  7  8  7  2
13  4  5  5  1
14  5  2  7  7
15  9  7  4  9
16  4  3  9  4
17  1  5  8  2
18  5  2  7  5
19  3  3  6  7

相关问题