pandas 忽略 Dataframe 列中的特定值以获取模式的代码?

ycl3bljg  于 2022-12-16  发布在  其他
关注(0)|答案(1)|浏览(117)

这是dataFrame的代码

导入numpy作为np导入panda作为pd

Data_Frame1 = {
  "company": ["A","B","C","A","A","B","C","B","C"],
  "employee": [10,12,13,10,51,11,12,12,12],
  "salary":[2,"unknown",4,"unknown",5,"unknown",8,8,4],
  "compartment":["madhyapradesh","uttarpradesh","gujarat","madhyapradesh","uttarpradesh","uttarpradesh","gujarat","gujarat","madhyapradesh"]
} 
df_1 = pd.DataFrame(Data_Frame1)
df_1

其输出类似于This is dataframe
对于mode,我编写了如下代码

emp=df_1.groupby('company')[['employee','salary',"compartment"]].agg(lambda x: pd.Series.mode(x)[0])
emp

它的输出是这样的

The output of above code

此处我将空值替换为unknow,在此公司中,B有三个值对应于salary列,如[“unknown',“unknown”,8],但由于它采用mode,因此将unknown作为结果,但我希望它将8作为mode,因为此处我将空值替换为unknown。那么实现此功能的代码应该是什么

xdyibdwo

xdyibdwo1#

您可以将unknown替换为避免计数mode,如果每组的所有值都是unknown,则您的解决方案失败,因此使用iternext技巧获取第一个模式(如果存在),否则NaN

emp=(df_1.replace('unknown', np.nan)
        .groupby('company')[['employee','salary',"compartment"]]
        .agg(lambda x: next(iter(x.mode()), np.nan)))
print (emp)
         employee  salary    compartment
company                                 
A              10     2.0  madhyapradesh
B              12     8.0   uttarpradesh
C              12     4.0        gujarat

您可以在新数据中进行测试:
x一个一个一个一个x一个一个二个x

相关问题