numpy 如何使用日和符号过滤分钟数据?

iqxoj9l9  于 2022-11-10  发布在  其他
关注(0)|答案(1)|浏览(138)

假设我有一个分钟OHLC的 Dataframe :

index                  timestamp     open     high  ...       vwap  symbol  cum_volume  cum_volume_25k
0       2020-11-05 09:00:00+00:00  2020-11-05 09:00:00+00:00  11.2000  11.2000  ...  11.199735     AAL        1360               0
1       2020-11-05 09:03:00+00:00  2020-11-05 09:03:00+00:00  11.0900  11.0900  ...  11.090000     AAL        1760               0
2       2020-11-05 09:07:00+00:00  2020-11-05 09:07:00+00:00  11.1600  11.1800  ...  11.167725     AAL        2538               0
3       2020-11-05 09:08:00+00:00  2020-11-05 09:08:00+00:00  11.1800  11.1800  ...  11.180000     AAL        3309               0
4       2020-11-05 09:09:00+00:00  2020-11-05 09:09:00+00:00  11.1800  11.1800  ...  11.180000     AAL        5309               0
...                           ...                        ...      ...      ...  ...        ...     ...         ...             ...
471734  2022-10-24 20:33:00+00:00  2022-10-24 20:33:00+00:00  94.6000  94.6000  ...  94.601900    SPOT     6094688               0
471735  2022-10-24 20:41:00+00:00  2022-10-24 20:41:00+00:00  94.6600  94.6600  ...  94.648500    SPOT     6094988               0
471736  2022-10-24 20:49:00+00:00  2022-10-24 20:49:00+00:00  94.0000  94.0000  ...  93.959055    SPOT     6095242               0
471737  2022-10-24 22:45:00+00:00  2022-10-24 22:45:00+00:00  94.5000  94.5000  ...  94.500000    SPOT     6095602               0
471738  2022-10-24 23:59:00+00:00  2022-10-24 23:59:00+00:00  94.3999  94.3999  ...  94.399162    SPOT     6096103               0

[471739 rows x 12 columns]

以及符号和日期的DF:

index symbol                  timestamp
0          0    AAL  2020-11-05 00:00:00+00:00
1        725    AAL  2020-11-06 00:00:00+00:00
2        773    AAL  2020-11-09 00:00:00+00:00
3       1594    AAL  2020-11-10 00:00:00+00:00
4       1654    AAL  2020-11-23 00:00:00+00:00
...      ...    ...                        ...
1233  469618   SPOT  2022-07-15 00:00:00+00:00
1234  470017   SPOT  2022-07-27 00:00:00+00:00
1235  470535   SPOT  2022-10-04 00:00:00+00:00
1236  470936   SPOT  2022-10-17 00:00:00+00:00
1237  471332   SPOT  2022-10-24 00:00:00+00:00

[1238 rows x 3 columns]

如何使用第二个 Dataframe 来过滤第一个 Dataframe ,以便只得到包含第二个 Dataframe 的符号和日期的微小数据的DF?

zc0qhyus

zc0qhyus1#

作为一种选择,您可以在每个 Dataframe 中创建带有日期的列。

dfd['day'] = dfd['timestamp'].dt.date
dfm['day'] = dfm['timestamp'].dt.date

在分钟表中按“日”栏分组。创建一个空的 Dataframe ,并在有日期和符号匹配的地方填写。

import pandas as pd

dfd['timestamp'] =  pd.to_datetime(dfd['timestamp'], errors='raise')#day
dfm['timestamp'] =  pd.to_datetime(dfm['timestamp'], errors='raise')#minute
dfd['day'] = dfd['timestamp'].dt.date
dfm['day'] = dfm['timestamp'].dt.date

df = pd.DataFrame()

def my_func(x):
    global df
    ind = x.index[0]
    aaa = dfd[(dfd['day'] == x['day'][ind]) & (dfd['symbol'] == x['symbol'][ind])]
    if len(aaa) > 0:#match found add these rows
        df = pd.concat([df, x])

dfm.groupby('day').apply(my_func)

print(df)

输出

timestamp     open     high       vwap symbol         day
0 2020-11-05 09:00:00+00:00  11.2000  11.2000  11.199735    AAL  2020-11-05
1 2020-11-05 09:03:00+00:00  11.0900  11.0900  11.090000    AAL  2020-11-05
2 2020-11-05 09:07:00+00:00  11.1600  11.1800  11.167725    AAL  2020-11-05
3 2020-11-05 09:08:00+00:00  11.1800  11.1800  11.180000    AAL  2020-11-05
4 2020-11-05 09:09:00+00:00  11.1800  11.1800  11.180000    AAL  2020-11-05
5 2022-10-24 20:33:00+00:00  94.6000  94.6000  94.601900   SPOT  2022-10-24
6 2022-10-24 20:41:00+00:00  94.6600  94.6600  94.648500   SPOT  2022-10-24
7 2022-10-24 20:49:00+00:00  94.0000  94.0000  93.959055   SPOT  2022-10-24
8 2022-10-24 22:45:00+00:00  94.5000  94.5000  94.500000   SPOT  2022-10-24
9 2022-10-24 23:59:00+00:00  94.3999  94.3999  94.399162   SPOT  2022-10-24

相关问题