pandas 用掷骰子制作多索引df

amrnrhlw  于 2023-08-01  发布在  其他
关注(0)|答案(2)|浏览(79)

我在摆弄Pandas和numpy,有一个教程,我的课程从骰子求和两个数字。教程使用了pandas,但我也尝试使用numpy,然后比较了结果。

throws = 50
diepd = pd.DataFrame([1, 2, 3, 4, 5, 6])
dienp = np.array([1,2,3,4,5,6])
np.random.seed(1) 
sum_np = [np.random.choice(dienp,2,True).sum() for i in range(throws)] 
sum_pd = [diepd.sample(2, replace=True).sum().loc[0] for i in range(throws)]

compare = pd.DataFrame(data={'sum_np': sum_np, 'sum_pd': sum_pd})

compare

字符串
我有真实的的困难理解/操作多索引数据框架,所以作为一个额外的教训,我想学习如何创建一个与结果,比较他们的差异(因为我使用相同的种子)。
索引将只是50(1到抛出)抛出。索引标签、列将是:级别0:2列:numpy结果和pandas结果。
级别1:每列三列:两个人的投掷和总和。例如,np.random.choice(dienp,2,True)diepd.sample(2, replace=True这两个值以及各自的和。
| | |Pandas|||||
| --|--|--|--|--|--| ------------ |
| 投掷1|投掷2|总和|投掷1|投掷2|总和| sum |
| 1|二个|三个|四|五|九| 9 |
| 二个|三个|五|六|1|七| 7 |
| 四|六|10个|五|二个|七| 7 |
有什么建议吗

eit6fx6z

eit6fx6z1#

看看代码的方式,要获得每个骰子的值似乎是非常困难的,而不循环遍历它自己的行上的trows,并将值附加到嵌入到循环中的列表。我的解决方案是设置两个不同的表,然后将它们连接在一起。你可以在下面看到我的代码:

import pandas as pd
import numpy as np

throws = 50
diepd = pd.DataFrame([1, 2, 3, 4, 5, 6])
dienp = np.array([1,2,3,4,5,6])
np.random.seed(1)
np_roll=[]
pd_roll=[]
for i in range(3):
    np_roll.append([])
    pd_roll.append([])
for i in range(throws):
    for j in range(2):
        np_roll[j].append(np.random.choice(dienp,1,True).sum())
        pd_roll[j].append(diepd.sample(1, replace=True).sum().loc[0])
        np_roll[j]=list(np_roll[j])
        pd_roll[j]=list(pd_roll[j])
    np_roll[2].append(np_roll[0][i]+np_roll[1][i])
    pd_roll[2].append(pd_roll[0][i]+pd_roll[1][i])
    

np_df = pd.DataFrame(data={'Roll 1': np_roll[0], 'Roll 2': np_roll[1], "Sum": np_roll[2]})
pd_df = pd.DataFrame(data={'Roll 1': pd_roll[0], 'Roll 2': pd_roll[1], "Sum": pd_roll[2]})

compare = pd.concat([np_df, pd_df],axis=1,keys=["Numpy", "Pandas"])

pd.set_option('display.max_columns', None)
print(compare)

字符串

ljo96ir5

ljo96ir52#

这可以如下完成

import numpy as np
import pandas as pd

arrays = [
    ["numpy", "numpy", "numpy", "pandas", "pandas", "pandas"],
    ["throw1", "throw2", "sum", "throw1", "throw2", "sum"]
]
tuples = list(zip(*arrays))
col_index = pd.MultiIndex.from_tuples(tuples)

throws = 50
diepd = pd.DataFrame([1, 2, 3, 4, 5, 6])
dienp = np.array([1, 2, 3, 4, 5, 6])
np.random.seed(1)

# Create the throw1 and throw2 columns from dienp
throw1_np = np.random.choice(dienp, throws, replace=True)
throw2_np = np.random.choice(dienp, throws, replace=True)

# Create the throw1 and throw2 columns from diepd
throw1_pd = diepd.sample(throws, replace=True).values
throw2_pd = diepd.sample(throws, replace=True).values

# Create
# Add throw1 and throw2 to obtain sum
sum_np = throw1_np + throw2_np
sum_pd = throw1_pd + throw2_pd

df = pd.DataFrame(np.column_stack([throw1_np, throw2_np, sum_np, throw1_pd, throw2_pd, sum_pd]), columns=col_index)

print(df.head())

字符串

相关问题