Pandas:Groupby和condition

ezykj2lf  于 2023-06-20  发布在  其他
关注(0)|答案(1)|浏览(88)

我有下面的mod_df Dataframe 与各种符号及其相应的价格。['下限']:是一列,用于查找较低的价格值。下面提到的代码是应用于mod_df['Lower Low ']列的条件,它工作正常。

data = {'Symbol': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
        'Date': ['2023-05-15 15:00:00', '2023-05-15 22:00:00', '2023-05-16 07:00:00', '2023-05-16 14:00:00',
                 '2023-05-17 07:00:00', '2023-05-17 20:00:00', '2023-05-18 02:00:00', '2023-05-18 16:00:00',
                 '2023-05-19 07:00:00', '2023-05-22 09:00:00', '2023-05-15 00:00:00',
                 '2023-05-16 12:00:00', '2023-05-17 06:00:00', '2023-05-18 02:00:00'],
        'Price': [0.90065, 0.90042, 0.89841, 0.89462, 0.89437, 0.89455, 0.89248, 0.89013, 0.89405, 0.89424, 0.59601,
                  0.59548, 0.59444, 0.59527],
        'Helper_L': [0, 0, 0, 0, 0, 1, 1, 1, 2, 3, 0, 0, 0, 1],

        }

mod_df = pd.DataFrame(data)
mod_df['Lower Low'] = np.where((mod_df['Helper_L'] != mod_df['Helper_L'].shift(-1))
                               & (mod_df['Price'] < mod_df['Price'].shift(1))
                               & (mod_df['Price'] < mod_df['Price'].shift(-1)), 'Lower Low', '')

print(mod_df)

Symbol       Date         Price  Helper_L Lower Low
A   15-05-2023 15:00:00  0.90065    0   
A   15-05-2023 22:00:00  0.90042    0   
A   16-05-2023 07:00:00  0.89841    0   
A   16-05-2023 14:00:00  0.89462    0   
A   17-05-2023 07:00:00  0.89437    0   Lower Low
A   17-05-2023 20:00:00  0.89455    1   
A   18-05-2023 02:00:00  0.89248    1   
A   18-05-2023 16:00:00  0.89013    1   Lower Low
A   19-05-2023 07:00:00  0.89405    2   
A   22-05-2023 09:00:00  0.89424    3   
B   15-05-2023 00:00:00  0.59601    0   
B   16-05-2023 12:00:00  0.59548    0   
B   17-05-2023 06:00:00  0.59444    0   Lower Low
B   18-05-2023 02:00:00  0.59527    1

由于我在符号列中有不同的符号,我试图使用下面的代码为mod_df['Lower Low ']列使用groupby和Lambda函数按符号分组,但我得到了错误:我得到错误:KeyError:'Helper_L'有人能建议如何解决这个问题吗?

mod_df['Lower Low'] = mod_df.groupby('Symbol')[['Helper_L','Price']]\
    .transform(lambda df: np.where((df['Helper_L'] !=df['Helper_L'].shift(-1)) & (df['Price'] < df['Price'].shift(1))
                                   & (df['Price'] < df['Price'].shift(-1)), 'Lower Low', ''))
dgsult0t

dgsult0t1#

不需要应用labmda函数。您可以在所需列上执行groupby + shift,然后使用np.where以有效的方式选择值

g = mod_df.groupby('Symbol')

mod_df['Lower Low'] = np.where(
    (mod_df['Helper_L'] != g['Helper_L'].shift(-1))
    & (mod_df['Price'] < g['Price'].shift( 1))
    & (mod_df['Price'] < g['Price'].shift(-1)), 
    'Lower Low', ''
)

结果

Symbol                 Date    Price  Helper_L  Lower Low
0       A  2023-05-15 15:00:00  0.90065         0           
1       A  2023-05-15 22:00:00  0.90042         0           
2       A  2023-05-16 07:00:00  0.89841         0           
3       A  2023-05-16 14:00:00  0.89462         0           
4       A  2023-05-17 07:00:00  0.89437         0  Lower Low
5       A  2023-05-17 20:00:00  0.89455         1           
6       A  2023-05-18 02:00:00  0.89248         1           
7       A  2023-05-18 16:00:00  0.89013         1  Lower Low
8       A  2023-05-19 07:00:00  0.89405         2           
9       A  2023-05-22 09:00:00  0.89424         3           
10      B  2023-05-15 00:00:00  0.59601         0           
11      B  2023-05-16 12:00:00  0.59548         0           
12      B  2023-05-17 06:00:00  0.59444         0  Lower Low
13      B  2023-05-18 02:00:00  0.59527         1

相关问题