pandas 获取两个不同 Dataframe 中的值的标准差

vwoqyblh  于 2022-11-27  发布在  其他
关注(0)|答案(1)|浏览(162)

我有两个DataFrame,我希望根据rc_id查找其中一列(即这两个DataFrame中的imapcted_users列)的标准差,并创建一个名为std的单独列,其中包含其标准差值
df1

data = {"timestamp":["2022-10-29","2022-10-29","2022-10-29","2022-10-29","2022-10-29","2022-10-29","2022-10-29","2022-10-29","2022-10-29"],
       "rc_id":[296,296,296,296,296,100,100,100,100],
       "impacted_users":[1,87,44,8,5,2,7,11,30]}

df1 = pd.DataFrame(data)
df1 = df1.groupby(["timestamp","rc_id"]).agg({"impacted_users": sum}
                                      ).reset_index()

df1:

rc_id           timestamp            impacted_users
     296           2022-10-29                  145
     100           2022-10-29                   50

df2

data1 = {"rc_id":[296,296,296,100,100,100],
       "impacted_users":[201,202,216,300,301,350]}

   df2 = pd.DataFrame(data1)
   df2

create df2:

rc_id            impacted_users
     296                201
     296                202
     296                216
     100                300
     100                301
     100                350

Expected Output:

id           timestamp             imapcted_users  std 
 296          2022-10-29 11:00:00      145          27.21   
 100          2022-10-29 11:00:00       50          117.36

我想要的是std,并将其作为单独的列(仅作为示例,我将从这些列中查找哪些值):

std(145, 201, 202,216)
std (50,300,301,350)

我无法想出一个策略来从不同的 Dataframe 中获得这个标准的dev。我试图将所需的值连接起来,然后通过聚合来获得标准的dev,但我想有一个更好的方法。

gmxoilav

gmxoilav1#

IIUC使用concat和聚合std,但是因为PandasSeries.std具有默认ddof=1,用于预期输出添加参数ddof=0,最后附加到df1

df1 = df1.groupby(["timestamp","rc_id"], as_index=False, sort=False)["impacted_users"].sum()
              
df = (df1.join(pd.concat([df1, df2])
                 .groupby('rc_id')['impacted_users'].std(ddof=0).rename('std'), on='rc_id'))
print (df)
    timestamp  rc_id  impacted_users         std
0  2022-10-29    296             145   27.212130
1  2022-10-29    100              50  117.367745

相关问题