Pandas合并行,其中特定列数据被拆分到多个行

jaxagkaj  于 2023-02-27  发布在  其他
关注(0)|答案(2)|浏览(142)

从PDF中提取表格产生以下 Dataframe :

Date      Transaction Details  Withdrawals  Deposits   Balance
0   01-01-2020  Tx1-Description - Line1       1625.0       NaN  97994.82
1          NaN                   Line 2          NaN       NaN       NaN
2   01-01-2020  Tx2-Description - Line1          NaN  84994.82  90000.00
3          NaN                   Line 2          NaN       NaN       NaN
4          NaN                   Line 3          NaN       NaN       NaN
5   02-01-2020  Tx3-Description - Line1         71.0       NaN  84923.82
6          NaN                   Line 2          NaN       NaN       NaN
7   02-01-2020  Tx4-Description - Line1          NaN     80.00  90000.00
8          NaN                   Line 2          NaN       NaN       NaN
9          NaN                   Line 3          NaN       NaN       NaN
10  03-01-2020  Tx5-Description - Line1        100.0       NaN  85000.00

如何正确合并Transaction Details列?
预期输出:

Date      Transaction Details              Withdrawals  Deposits  Balance
0   01-01-2020  Tx1-Description - Line1 Line 2         1625.0      NaN       97994.82
1   01-01-2020  Tx2-Description - Line1 Line 2 Line 3  NaN         84994.82  90000.00
2   02-01-2020  Tx3-Description - Line1 Line 2         71.0        NaN       84923.82
3   02-01-2020  Tx4-Description - Line1 Line 2 Line 3  NaN         80.00     90000.00
4   03-01-2020  Tx5-Description - Line1                100.0       NaN       85000.00
9vw9lbht

9vw9lbht1#

IIUC,您可以使用“日期”进行groupby分组,然后汇总:

(df.groupby(df['Date'].notna().cumsum(), as_index=False)
   .agg({'Date': 'first', 'Transaction Details': ' '.join,
         'Withdrawals': 'sum', 'Deposits': 'sum', 'Balance': 'sum'})
)
  • 注意,NaNs变为0,但如果需要,您可以replace(0, float('nan')) *

输出:

Date                    Transaction Details  Withdrawals  Deposits   Balance
0  01-01-2020         Tx1-Description - Line1 Line 2       1625.0      0.00  97994.82
1  01-01-2020  Tx2-Description - Line1 Line 2 Line 3          0.0  84994.82  90000.00
2  02-01-2020         Tx3-Description - Line1 Line 2         71.0      0.00  84923.82
3  02-01-2020  Tx4-Description - Line1 Line 2 Line 3          0.0     80.00  90000.00
4  03-01-2020                Tx5-Description - Line1        100.0      0.00  85000.00
x4shl7ld

x4shl7ld2#

df1.loc[:,:"Transaction Details"].assign(col1=lambda dd:dd.Date.notna().cumsum())\
    .assign(col2=lambda dd:dd.index)\
    .groupby("col1").agg(**{"Transaction Details":("Transaction Details", "".join),"col2":("col2","first")}).rename_axis(None)\
    .join(df1.drop("Transaction Details",axis=1),on='col2')

输出:

Date                    Transaction Details  Withdrawals  Deposits   Balance
0  01-01-2020         Tx1-Description - Line1 Line 2       1625.0      0.00  97994.82
1  01-01-2020  Tx2-Description - Line1 Line 2 Line 3          0.0  84994.82  90000.00
2  02-01-2020         Tx3-Description - Line1 Line 2         71.0      0.00  84923.82
3  02-01-2020  Tx4-Description - Line1 Line 2 Line 3          0.0     80.00  90000.00
4  03-01-2020                Tx5-Description - Line1        100.0      0.00  85000.00

相关问题