Pandas根据日期设置另一个DataFrame中的值

tvmytwxo  于 2023-03-06  发布在  其他
关注(0)|答案(1)|浏览(125)

这一点:

periods = 5 * 3                                                               
df1 = pandas.DataFrame(dict(                                                  
  v1=numpy.arange(2, 2 + periods) * 2,                                        
  v2=numpy.arange(3, 3 +periods) * 3),                                        
  index=pandas.date_range('2023-01-01', periods=periods, freq='8H'))          
print(df1)                                                                    
                                                                              
periods = 3                                                                   
df2 = pandas.DataFrame(dict(                                                  
  v3=numpy.arange(4, 4 + periods) * 4,                                        
  v4=numpy.arange(5, 5 + periods) * 5),                                       
  index=pandas.date_range('2023-01-02', periods=periods, freq='2D'))          
print(df2)                                                                    
                                                                              
df1.loc[df1.index.date, ['v3', 'v4']] = df2                                   
print(df1)

结果:

v1  v2
2023-01-01 00:00:00   4   9
2023-01-01 08:00:00   6  12
2023-01-01 16:00:00   8  15
2023-01-02 00:00:00  10  18
2023-01-02 08:00:00  12  21
2023-01-02 16:00:00  14  24
2023-01-03 00:00:00  16  27
2023-01-03 08:00:00  18  30
2023-01-03 16:00:00  20  33
2023-01-04 00:00:00  22  36
2023-01-04 08:00:00  24  39
2023-01-04 16:00:00  26  42
2023-01-05 00:00:00  28  45
2023-01-05 08:00:00  30  48
2023-01-05 16:00:00  32  51
            v3  v4
2023-01-02  16  25
2023-01-04  20  30
2023-01-06  24  35
                     v1  v2    v3    v4
2023-01-01 00:00:00   4   9   NaN   NaN
2023-01-01 08:00:00   6  12   NaN   NaN
2023-01-01 16:00:00   8  15   NaN   NaN
2023-01-02 00:00:00  10  18  16.0  25.0
2023-01-02 08:00:00  12  21  16.0  25.0
2023-01-02 16:00:00  14  24  16.0  25.0
2023-01-03 00:00:00  16  27   NaN   NaN
2023-01-03 08:00:00  18  30   NaN   NaN
2023-01-03 16:00:00  20  33   NaN   NaN
2023-01-04 00:00:00  22  36  20.0  30.0
2023-01-04 08:00:00  24  39  20.0  30.0
2023-01-04 16:00:00  26  42  20.0  30.0
2023-01-05 00:00:00  28  45   NaN   NaN
2023-01-05 08:00:00  30  48   NaN   NaN
2023-01-05 16:00:00  32  51   NaN   NaN

其中每当df1的日期与df2的日期匹配时(即忽略时间分量),将来自df2的每个值复制到df1
然而,改变df1索引以具有时间分量(在该示例中为01:00),即:

periods = 5 * 3                                                             
df1 = pandas.DataFrame(dict(                                                
  v1=numpy.arange(2, 2 + periods) * 2,                                      
  v2=numpy.arange(3, 3 +periods) * 3),                                      
  index=pandas.date_range('2023-01-01 01:00', periods=periods, freq='8H'))  
print(df1)                                                                  
                                                                            
periods = 3                                                                 
df2 = pandas.DataFrame(dict(                                                
  v3=numpy.arange(4, 4 + periods) * 4,                                      
  v4=numpy.arange(5, 5 + periods) * 5),                                     
  index=pandas.date_range('2023-01-02', periods=periods, freq='2D'))        
print(df2)                                                                  
                                                                            
df1.loc[df1.index.date, ['v3', 'v4']] = df2                                 
print(df1)

结果:

v1  v2
2023-01-01 01:00:00   4   9
2023-01-01 09:00:00   6  12
2023-01-01 17:00:00   8  15
2023-01-02 01:00:00  10  18
2023-01-02 09:00:00  12  21
2023-01-02 17:00:00  14  24
2023-01-03 01:00:00  16  27
2023-01-03 09:00:00  18  30
2023-01-03 17:00:00  20  33
2023-01-04 01:00:00  22  36
2023-01-04 09:00:00  24  39
2023-01-04 17:00:00  26  42
2023-01-05 01:00:00  28  45
2023-01-05 09:00:00  30  48
2023-01-05 17:00:00  32  51
            v3  v4
2023-01-02  16  25
2023-01-04  20  30
2023-01-06  24  35
...
KeyError: "None of [Index([2023-01-01, 2023-01-01, 2023-01-01, 2023-01-02, 2023-01-02, 2023-01-02,\n       2023-01-03, 2023-01-03, 2023-01-03, 2023-01-04, 2023-01-04, 2023-01-04,\n       2023-01-05, 2023-01-05, 2023-01-05],\n      dtype='object')] are in the [index]"

很显然:

df1.loc[df1.index.date, ['v3', 'v4']] = df2

不是基于日期设置值(即忽略时间)的适当方式。
问题:

  • 为什么有时间成分的时候就不起作用了?
  • 既然它不能与时间组件一起工作,为什么它能在没有时间组件的情况下通过匹配 * all * times(即不仅仅是00:00)来工作呢?
  • 什么是正确的方法来完成我的目标?
2wnc66cl

2wnc66cl1#

在第一个示例中,您很幸运地有一些日期时间为00:00:00,因此可以使用.loc;在第二个示例中,您没有这种日期时间,因此无法选择任何行。
正确的方法可能是在normalize索引之后使用merge

out = df1.merge(df2, left_on=df1.index.normalize(), right_index=True, how='left')
print(out)

# Output
                     v1  v2    v3    v4
2023-01-01 01:00:00   4   9   NaN   NaN
2023-01-01 09:00:00   6  12   NaN   NaN
2023-01-01 17:00:00   8  15   NaN   NaN
2023-01-02 01:00:00  10  18  16.0  25.0
2023-01-02 09:00:00  12  21  16.0  25.0
2023-01-02 17:00:00  14  24  16.0  25.0
2023-01-03 01:00:00  16  27   NaN   NaN
2023-01-03 09:00:00  18  30   NaN   NaN
2023-01-03 17:00:00  20  33   NaN   NaN
2023-01-04 01:00:00  22  36  20.0  30.0
2023-01-04 09:00:00  24  39  20.0  30.0
2023-01-04 17:00:00  26  42  20.0  30.0
2023-01-05 01:00:00  28  45   NaN   NaN
2023-01-05 09:00:00  30  48   NaN   NaN
2023-01-05 17:00:00  32  51   NaN   NaN
    • 更新**

但是为什么选择了非00:00时间的行?
它们不是,您重新索引 Dataframe ,并使用00:00时间复制行:

>>> df1.loc[df1.index.date]
            v1  v2
2023-01-01   4   9
2023-01-01   4   9
2023-01-01   4   9
2023-01-02  10  18  # <- original 2023-01-02 00:00:00 (10, 18)
2023-01-02  10  18  # <- dupe, not 2023-01-02 08:00:00 (12, 21)
2023-01-02  10  18  # <- dupe, not 2023-01-02 16:00:00 (14, 24)
2023-01-03  16  27
2023-01-03  16  27
2023-01-03  16  27
2023-01-04  22  36
2023-01-04  22  36
2023-01-04  22  36
2023-01-05  28  45
2023-01-05  28  45
2023-01-05  28  45

相关问题