pandas 对位于另一个数据框中的日期之间的数据框求和

50few1ms  于 2023-10-14  发布在  其他
关注(0)|答案(2)|浏览(89)

假设我有两个pandas,它们分别为df1和df2:

import pandas as pd

df1 = pd.DataFrame({'Name': ['A', 'B', 'C'], 
                    'Date1':['2023-01-01', '2023-01-02', '2023-01-03'], 
                    'Date2':['2023-01-03', '2023-01-04', '2023-01-05']})
df1.loc[:, ['Date1', 'Date2']] = df1.loc[:, ['Date1', 'Date2']].apply(pd.to_datetime, errors='coerce')

df2 = pd.DataFrame({'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'],
                    'A': [1, 2, 3, 4, 5], 'B': [6, 7, 8, 9, 10], 'C': [11,12,13,14,15]})
df2['Date'] = pd.to_datetime(df2['Date'])

是否有一种有效的方法来向df1中添加一个名为“Sum”的列,该列将计算df2中位于df1中引用的Date1和Date2之间的name数据的总和(不包括Date1,包括Date2)。
期望的结果应该是:

df_result = pd.DataFrame({'Name': ['A', 'B', 'C'], 
                          'Date1':['2023-01-01', '2023-01-02', '2023-01-03'], 
                          'Date2':['2023-01-03', '2023-01-04', '2023-01-05'],
                          'Sum': [2+3, 8+9, 14+15]})
t9aqgxwy

t9aqgxwy1#

分步解决

熔化框架df2

df = df2.melt(id_vars=['Date'], var_name='Name')

#          Date Name  value
# 0  2023-01-01    A      1
# 1  2023-01-02    A      2
# 2  2023-01-03    A      3
# 3  2023-01-04    A      4
# 4  2023-01-05    A      5
# 5  2023-01-01    B      6
# 6  2023-01-02    B      7
# 7  2023-01-03    B      8
# 8  2023-01-04    B      9
# 9  2023-01-05    B     10
# 10 2023-01-01    C     11
# 11 2023-01-02    C     12
# 12 2023-01-03    C     13
# 13 2023-01-04    C     14
# 14 2023-01-05    C     15

df1合并熔化的三角形框架。然后屏蔽df2中Date超出边界的值

df = df1.merge(df, on='Name', how='left')
df['value'] = df['value'].where(df.eval('`Date1` <= `Date` <= `Date2`'))

#    Name                Date1                Date2       Date  value
# 0     A  2023-01-01 00:00:00  2023-01-03 00:00:00 2023-01-01    1.0
# 1     A  2023-01-01 00:00:00  2023-01-03 00:00:00 2023-01-02    2.0
# 2     A  2023-01-01 00:00:00  2023-01-03 00:00:00 2023-01-03    3.0
# 3     A  2023-01-01 00:00:00  2023-01-03 00:00:00 2023-01-04    NaN
# 4     A  2023-01-01 00:00:00  2023-01-03 00:00:00 2023-01-05    NaN
# 5     B  2023-01-02 00:00:00  2023-01-04 00:00:00 2023-01-01    NaN
# 6     B  2023-01-02 00:00:00  2023-01-04 00:00:00 2023-01-02    7.0
# 7     B  2023-01-02 00:00:00  2023-01-04 00:00:00 2023-01-03    8.0
# 8     B  2023-01-02 00:00:00  2023-01-04 00:00:00 2023-01-04    9.0
# 9     B  2023-01-02 00:00:00  2023-01-04 00:00:00 2023-01-05    NaN
# 10    C  2023-01-03 00:00:00  2023-01-05 00:00:00 2023-01-01    NaN
# 11    C  2023-01-03 00:00:00  2023-01-05 00:00:00 2023-01-02    NaN
# 12    C  2023-01-03 00:00:00  2023-01-05 00:00:00 2023-01-03   13.0
# 13    C  2023-01-03 00:00:00  2023-01-05 00:00:00 2023-01-04   14.0
# 14    C  2023-01-03 00:00:00  2023-01-05 00:00:00 2023-01-05   15.0

用sum将数组和聚合值分组

df = df.groupby(list(df1.columns), as_index=False)['value'].sum()

#   Name      Date1      Date2  value
# 0    A 2023-01-01 2023-01-03    6.0
# 1    B 2023-01-02 2023-01-04   24.0
# 2    C 2023-01-03 2023-01-05   42.0
pzfprimi

pzfprimi2#

我认为最简单的方法是使用apply,它根据您的条件进行过滤并总结:

def temp_func(x, df2):
    col = str(x['Name'])
    return df2[(df2['Date'] <= x['Date2'] ) & (df2['Date'] > x['Date1'])][col].sum()
df1['Sum'] = df1.apply(lambda x: temp_func(x, df2), axis=1)
print(df1)

输出量:

Name                Date1                Date2  Sum
0    A  2023-01-01 00:00:00  2023-01-03 00:00:00    5
1    B  2023-01-02 00:00:00  2023-01-04 00:00:00   17
2    C  2023-01-03 00:00:00  2023-01-05 00:00:00   29

相关问题