使用ffill和bfill为每个组填充缺少的值

e5njpo68  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(491)

我正在尝试为组填充缺少的值(区号、商店名称、商品名称、日期、销售额)。对于每个组,我需要用52周的数据填写销售金额。我需要用ffill(df.ffill())和bfill(df.bfill())创建两个不同的列,然后我需要用ffill&bfill/2对新创建的列求和以获得结果。

area_code   shop_name   item_name   week_date   sales_amount
101 Global Market   Mango Fruits    6/3/2018    5.13
101 Global Market   Mango Fruits    6/10/2018   nan
101 Global Market   Mango Fruits    6/17/2018   7.13
101 Global Market   Chips   6/3/2018    5
101 Global Market   Chips   6/10/2018   nan
102 Global Market   Mango Fruits    6/3/2018    10.34
102 Global Market   Mango Fruits    6/10/2018   nan
102 Global Market   Chips   6/10/2018   nan
102 Global Market   Chips   6/17/2018   nan
102 Global Market   Chips   6/24/2018   nan
102 Global Market   Potato  6/24/2018   nan

After

area_code   shop_name   item_name   week_date   sales_amount
101 Global Market   Mango Fruits    6/3/2018    5.13
101 Global Market   Mango Fruits    6/10/2018   6.13
101 Global Market   Mango Fruits    6/17/2018   7.13
101 Global Market   Chips   6/3/2018    5
101 Global Market   Chips   6/10/2018   5
102 Global Market   Mango Fruits    6/3/2018    10.34
102 Global Market   Mango Fruits    6/10/2018   10.34
102 Global Market   Chips   6/10/2018   Value available before this week for this group
102 Global Market   Chips   6/17/2018   Value available before this week for this group
102 Global Market   Chips   6/24/2018   Value available before this week for this group
102 Global Market   Potato  6/24/2018   Value available before this week for this group
For example - 
Week 1 10
Week 2 nan
week 3 nan

“本周前本组可用值”表示第3周、第2周的值与第1周的值相同。否则,如果第1周和第3周有数据,则根据ffill或bfill填写第2周。如果不喜欢这样,那么只需为每个组填充ffill或bfill值。
如何迭代Dataframe?
如何迭代每个组并填充值?
我试着用,但没有得到任何运气
我需要填写的一周数据从2018年6月3日开始,到2019年6月3日结束
Pandas:用每组的平均值来填充缺失的值

4xrmg8kj

4xrmg8kj1#

前向填充和后向填充在Pandas中非常容易,我以前也有同样的要求,我遵循这里提到的方法
https://johnpaton.net/posts/forward-fill-spark/#:~:text=the%20strategy%20to%20forward%20fill,因为%20是%20 sys之间的%20行。

相关问题