这是dataFrame的代码
导入numpy作为np导入panda作为pd
Data_Frame1 = {
"company": ["A","B","C","A","A","B","C","B","C"],
"employee": [10,12,13,10,51,11,12,12,12],
"salary":[2,"unknown",4,"unknown",5,"unknown",8,8,4],
"compartment":["madhyapradesh","uttarpradesh","gujarat","madhyapradesh","uttarpradesh","uttarpradesh","gujarat","gujarat","madhyapradesh"]
}
df_1 = pd.DataFrame(Data_Frame1)
df_1
其输出类似于This is dataframe
对于mode,我编写了如下代码
emp=df_1.groupby('company')[['employee','salary',"compartment"]].agg(lambda x: pd.Series.mode(x)[0])
emp
它的输出是这样的
此处我将空值替换为unknow,在此公司中,B有三个值对应于salary列,如[“unknown',“unknown”,8],但由于它采用mode,因此将unknown作为结果,但我希望它将8作为mode,因为此处我将空值替换为unknown。那么实现此功能的代码应该是什么
1条答案
按热度按时间xdyibdwo1#
您可以将
unknown
替换为避免计数mode
,如果每组的所有值都是unknown
,则您的解决方案失败,因此使用iter
和next
技巧获取第一个模式(如果存在),否则NaN
:您可以在新数据中进行测试:
x一个一个一个一个x一个一个二个x