pandas 修正了点-逗号混淆,在将数组列dtype“object”转换为“float”时产生NaN的问题,

0wi1tuuw  于 2023-10-14  发布在  其他
关注(0)|答案(2)|浏览(106)

我有以下dataframe:

ID customer Month   Amount
0   026         201707  31,65
1   026         201708  31,65
2   026         201709  31,65
3   026         201710  31,65
4   026         201711  31,65

其中Amount最初是object类型。我想计算每个ID的总和和平均值。首先,我尝试将Amount列从object转换为float

df['Amount'] = pd.to_numeric(df['Amount'], errors='coerce')

但我得到了Amount列中所有值的NaN

ID customer Month   Amount
0   026         201707  NaN

如何将列对象类型转换为具有实际数字的float,然后聚合每个客户的值(总和,平均值)?

ie3xauqp

ie3xauqp1#

pd.to_numeric之前使用Series.str.replace,转换为.然后您可以使用groupby.agg

agg_df = (df.assign(Amount = pd.to_numeric(df['Amount'].str.replace(',','.'),
                                           errors = 'coerce'))
            .groupby('ID').Amount.agg(['mean','sum']))
print(agg_df)
#if you want change the type of Amount previously
#df['Amount'] =pd.to_numeric(df['Amount'].str.replace(',','.'),errors = 'coerce')
#agg_df = df.groupby('ID').Amount.agg(['mean','sum']))

     mean    sum
ID              
0   31.65  31.65
1   31.65  31.65
2   31.65  31.65
3   31.65  31.65
4   31.65  31.65

如果你想聚合到初始化的框架,使用GroupBy.transform

groups = pd.to_numeric(df['Amount'].str.replace(',','.'),errors = 'coerce').groupby(df['ID'])
#if you want change the type of Amount previously
#df['Amount'] =pd.to_numeric(df['Amount'].str.replace(',','.'),errors = 'coerce')
#groups = df.groupby('ID')['Amount']
df['mean'] = groups.transform('mean')
df['sum'] = groups.transform('sum')
print(df)
   ID  customer   Month Amount   mean    sum  
0   0        26  201707  31,65  31.65  31.65  
1   1        26  201708  31,65  31.65  31.65  
2   2        26  201709  31,65  31.65  31.65  
3   3        26  201710  31,65  31.65  31.65  
4   4        26  201711  31,65  31.65  31.65
kqqjbcuj

kqqjbcuj2#

更简单的语法可用吗?接受答案的第一部分没有问题:

df['Amount'] = pd.to_numeric(df['Amount'].str.replace(',','.'))

关于分组,pivot允许更简单的语法:

pd.pivot_table(data = df,
               values = ['Amount'],
               index = ['ID customer'],
               aggfunc = [np.sum, np.mean])

输出量:

sum   mean
             Amount Amount
ID customer               
026          158.25  31.65

相关问题