python-3.x 如何用群体的方式给Pandas填na

lvmkulzt 于 2023-01-10 发布在 Python

关注(0)|答案(2)|浏览(88)

我有这样一个Pandas数据框架：

df = 

       a                    b
       a1                   b1
       a1                   b2
       a1                   b1
       a1                   Nan
       a2                   b1
       a2                   b2
       a2                   b2
       a2                   Nan
       a2                   b2
       a3                   Nan

对于a的每一个值，b可以有多个b的值与之对应，我想用b值按a的对应值分组的方式来填充b的所有nan值。
生成的 Dataframe 应如下所示：

df = 

       a                    b
       a1                   b1
       a1                   b2
       a1                   b1
       a1                   ***b1***
       a2                   b1
       a2                   b2
       a2                   b2
       a2                   **b2**
       a2                   b2
       a3                   b2

b1上面是a1对应的b的模式，同理b2是a2对应的模式，最后a3没有数据，用全局模式b2填充。
对于b列的每一个nan值，我想用b列的值的众数来填充它，但是，对于a的特定值，不管众数是什么。
编辑：
如果存在b上没有数据的组a，则以全局模式填充。

python-3.x

来源：https://stackoverflow.com/questions/63944973/how-to-fill-na-in-pandas-by-the-mode-of-a-group

2条答案

按热度按时间

jdg4fx2g1#

试试看：

# lazy grouping
groups = df.groupby('a')

# where all the rows within a group is NaN
all_na = groups['b'].transform(lambda x: x.isna().all())

# fill global mode
df.loc[all_na, 'b'] = df['b'].mode()[0]

# fill with local mode
mode_by_group = groups['b'].transform(lambda x: x.mode()[0])
df['b'] = df['b'].fillna(mod_by_group)

赞(0）回复(0）举报 2023-01-10

lsmd5eda2#

You are getting the IndexError: index out of bounds because last a column value a3没有对应的B列值，因此没有组可填充。Solution would be have try catch block while fillna and then apply ffill and bfill。以下是代码解决方案。

data_stack = [['a1','b1'],['a1','b2'],['a1','b1'],['a1',np.nan],['a2','b1'], 
['a2','b2'],['a2','b2'],['a2',np.nan],['a2','b2'],['a3',np.nan]]
df_try_stack = pd.DataFrame(data_stack, columns=["a","b"])

# This function will fill na values of group to the mode value
def fillna_group(grp):
    try:
        return grp.fillna(grp.mode()[0])
    except BaseException as e:
        print('Error as no correspindg group: ' + str(e))
df_try_stack["b"] = df_try_stack["b"].fillna(df_try_stack.groupby(["a"]) 
['b'].transform(lambda grp : fillna_group(grp)))
df_try_stack = df_try_stack.ffill(axis = 0)
df_try_stack = df_try_stack.bfill(axis =0)

赞(0）回复(0）举报 2023-01-10

我来回答

python-3.x 如何用群体的方式给Pandas填na

2条答案

相关问题

热门标签

最新问答