pandas中最后n个数据可用列的行平均值

nfs0ujit  于 2023-04-10  发布在  其他
关注(0)|答案(3)|浏览(114)

我有一个 Dataframe 如下;
| 身份证|扬|二月|马尔|四月|五月|
| --------------|--------------|--------------|--------------|--------------|--------------|
| 1|10个|三十||||
| 二|10个|二十|五十|五十|六十|
| 三||||七十|五十|
| 四||三十|四十|||
我想得到最后两列的行平均值(仅在数据可用的情况下)
预期产出如下;
| 身份证|扬|二月|马尔|四月|五月|平均|
| --------------|--------------|--------------|--------------|--------------|--------------|--------------|
| 1|10|30人||||二十|
| 二|10个|二十|五十|50|60人|五十五|
| 三||||70|50人|六十|
| 四||**30|**40|||三十五|
我的实际数据框有n个更大的列,我不能硬编码列名。我该如何继续前进呢?

63lcw9qa

63lcw9qa1#

计算掩码以仅保留具有notnacumsum的相关单元格:

N = 2
m = df.loc[:, ::-1].notna().cumsum(axis=1).le(N)

df['average'] = df.drop(columns='id').where(m).mean(axis=1)

您还可以利用stack来摆脱NaN,然后获得每个ID的最后N个值:

N = 2
df['average'] = (
 df.set_index('id').stack()
   .groupby(level='id')
   .apply(lambda x: x.tail(N).mean())
   .reindex(df['id']).values
)

输出:

id   jan   feb   mar   apr   may  average
0   1  10.0  30.0   NaN   NaN   NaN     20.0
1   2  10.0  20.0  50.0  50.0  60.0     55.0
2   3   NaN   NaN   NaN  70.0  50.0     60.0
3   4   NaN  30.0  40.0   NaN   NaN     35.0
rseugnpd

rseugnpd2#

使用agg函数

import pandas as pd
import numpy as np

df=pd.DataFrame({'id':[1, 2, 3, 4],
                 'jan':[10, 10, np.nan, np.nan],
                 'feb':[30, 20, np.nan, 30],
                 'mar':[np.nan, 50, np.nan, 40],
                 'apr':[np.nan, 50, 70, np.nan],
                 'may':[np.nan, 60, 50, np.nan],
                 })

N=2
df['LastN_Mean'] = df[df.columns[1:]].agg(lambda r: r[r.notna()][-N:].mean(), axis=1)

print(df)
id   jan   feb   mar   apr   may  LastN_Mean
0   1  10.0  30.0   NaN   NaN   NaN        20.0
1   2  10.0  20.0  50.0  50.0  60.0        55.0
2   3   NaN   NaN   NaN  70.0  50.0        60.0
3   4   NaN  30.0  40.0   NaN   NaN        35.0
ejk8hzay

ejk8hzay3#

col1=df1.apply(lambda ss:ss.dropna().tail(2).mean(),1)
df1.assign(average=col1)

输出:

id   jan   feb   mar   apr   may  average
0   1  10.0  30.0   NaN   NaN   NaN     20.0
1   2  10.0  20.0  50.0  50.0  60.0     55.0
2   3   NaN   NaN   NaN  70.0  50.0     60.0
3   4   NaN  30.0  40.0   NaN   NaN     35.0

相关问题