我有分组级别为['颜色','水果','日期','值']的数据。
data = pd.DataFrame({'color': ['Green','Green', 'Green', 'Green', 'Red', 'Red'],
'fruit' : ['Banana', 'Banana', 'Apple', 'Apple', 'Banana', 'Apple'],
'date': ['2011-01-01', '2011-01-02', '2011-01-01', '2011-01-02', '2011-02-01', '2011-02-01'],
'value': [ 1, np.nan, np.nan, 2, 3 , np.nan]})
Output:
Class fruit date value
0 Green Banana 2011-01-01 1.0
1 Green Banana 2011-01-02 NaN
2 Green Apple 2011-01-01 NaN
3 Green Apple 2011-01-02 2.0
4 Yellow Banana 2011-02-01 3.0
5 Yellow Apple 2011-02-01 NaN
我需要填充“值”,而对于日期我们没有数据。因此,此填充将仅限于[“颜色”,“水果”]级别。
我尝试使用df = df.groupby(['color', 'fruit', 'date'])['value'].mean().replace(to_replace=0, method='ffill')
填充,但这会将数据溢出到下一个关联的[color,fruit]组
Expected Output:
Class fruit date value
0 Green Banana 2011-01-01 1.0
1 Green Banana 2011-01-02 1.0
2 Green Apple 2011-01-01 NaN
3 Green Apple 2011-01-02 2.0
4 Yellow Banana 2011-02-01 3.0
5 Yellow Apple 2011-02-01 NaN
1条答案
按热度按时间y1aodyip1#
您可以将
GroupBy.cumcount
与pandas.Series.ffill
一起使用:或者如@*Mustafa艾丹 * 所述,只需使用
GroupBy.ffill
:输出: