numpy 如何按天对OHLC分钟数据进行分组?

vtwuwzda  于 2022-11-10  发布在  其他
关注(0)|答案(1)|浏览(121)

我有一堆分钟数据,如下所示:

index                  timestamp    open    high    low  ...  trade_count       vwap  symbol  cum_volume cum_volume_25k
0       2021-11-02 13:30:00+00:00  2021-11-02 13:30:00+00:00  12.000  12.055  12.00  ...           62  12.003693    AMCR       47208              0
1       2021-11-02 13:31:00+00:00  2021-11-02 13:31:00+00:00  12.040  12.070  12.04  ...          117  12.055675    AMCR       67656              0
2       2021-11-02 13:32:00+00:00  2021-11-02 13:32:00+00:00  12.050  12.050  12.03  ...           64  12.040067    AMCR       74411              0
3       2021-11-02 13:33:00+00:00  2021-11-02 13:33:00+00:00  12.035  12.035  12.01  ...           95  12.021537    AMCR       83597              0
4       2021-11-02 13:34:00+00:00  2021-11-02 13:34:00+00:00  12.035  12.035  12.02  ...           16  12.028696    AMCR       84855              0
...                           ...                        ...     ...     ...    ...  ...          ...        ...     ...         ...            ...
429000  2022-10-21 19:59:00+00:00  2022-10-21 19:59:00+00:00  29.850  29.940  29.85  ...          298  29.894083    YETI      688656              0
429001  2022-10-21 20:00:00+00:00  2022-10-21 20:00:00+00:00  29.890  29.890  29.89  ...           15  29.890000    YETI      776882              0
429002  2022-10-21 20:08:00+00:00  2022-10-21 20:08:00+00:00  29.890  29.890  29.89  ...            1  29.890000    YETI      781148              0
429003  2022-10-21 22:06:00+00:00  2022-10-21 22:06:00+00:00  30.060  30.060  30.06  ...            3  30.060000    YETI      782348              0
429004  2022-10-21 22:44:00+00:00  2022-10-21 22:44:00+00:00  30.060  30.060  30.06  ...            1  30.060000    YETI      783148              0

[429005 rows x 12 columns]

然后我摸索着一天又一天的时间:

df['timestamp'] = pd.to_datetime(df['timestamp'])
days = df.groupby(df['timestamp'].dt.normalize().unique())

然后我一天一天地循环:

list_of_df = []
for index, name in days:
    list_of_df.append(name)
final = pd.concat(list_of_df)
print(final[['symbol', 'timestamp']])

它返回:

symbol                 timestamp
95    AMCR 2021-11-02 15:05:00+00:00
186   AMCR 2021-11-02 16:37:00+00:00
187   AMCR 2021-11-02 16:38:00+00:00
0     AMCR 2021-11-02 13:30:00+00:00
212   AMCR 2021-11-02 17:03:00+00:00
..     ...                       ...
210   AMCR 2021-11-02 17:01:00+00:00
211   AMCR 2021-11-02 17:02:00+00:00
138   AMCR 2021-11-02 15:49:00+00:00
192   AMCR 2021-11-02 16:43:00+00:00
58    AMCR 2021-11-02 14:28:00+00:00

[230 rows x 2 columns]

我怎么才能让它只返回唯一的日子呢?E.g

symbol                 timestamp
95    AMCR 2021-11-02 00:00:00+00:00
97    APPL 2021-11-02 00:00:00+00:00
100   APPL 2021-11-03 00:00:00+00:00
102   APPL 2021-11-06 00:00:00+00:00

[230 rows x 2 columns]
tv6aics1

tv6aics11#

给定(我更改了时间戳以获得更多变化):

timestamp    open    high    low  symbol
0  2021-11-02 13:30:00+00:00  12.000  12.055  12.00    AMCR
1  2021-11-02 13:31:00+00:00  12.040  12.070  12.04    AMCR
2  2021-11-08 13:32:00+00:00  12.050  12.050  12.03    YETI
3  2021-11-03 13:33:00+00:00  12.035  12.035  12.01    AMCR
4  2022-10-21 19:59:00+00:00  12.035  12.035  12.02    YETI

假设您希望每个符号都有日期,请执行以下操作:

out = (df[['symbol', 'timestamp']]
         .assign(timestamp=df.timestamp.dt.normalize())
         .drop_duplicates())
print(out)

产出:

symbol                 timestamp
0   AMCR 2021-11-02 00:00:00+00:00
2   YETI 2021-11-08 00:00:00+00:00
3   AMCR 2021-11-03 00:00:00+00:00
4   YETI 2022-10-21 00:00:00+00:00

相关问题