numpy Python中“一年中日期的小时”的数组平均值

pgx2nnw8 于 2023-03-30 发布在 Python

关注(0)|答案(1)|浏览(99)

我有两个数组：

3D numpy数组，形状为（1，87648，100），dtype float64
1.形状为（87648，）、类型为pandas的1D数组DatetimeIndex
3D数组沿着轴=1的值对应于1D数组中的每小时序列日期时间。总持续时间为10年，其中有2个闰年（即8760 * 8 + 8784 * 2 = 87648）。没有夏令时，因此每天正好有24个对应的值。
我想计算10年数据中一年中每小时的平均值。这意味着，在10年中，我想平均1月1日的所有0小时，1月1日的所有1小时，...，这样我在最后有8784个平均值，每个平均值都是10个数据点的平均值，除了2月29日的24小时，这些将是每个2个数据点的平均值。
为了更精确地说明，所需的结果是具有形状（1，8748，100）和dtype float64的3D数组。
让3D数组被称为“volume”，1D datetime数组被称为“datetime_array”，我不完整的最后一次尝试是在这个方向上，但我真的对这个问题感到困惑：

hour_of_year = np.array([dt.hour + (dt.dayofyear - 1) * 24 for dt in datetime_array])
volume_by_hour = np.reshape(volume, (volume.shape[0], volume.shape[1] / 24, volume.shape[2], 24))
profile = np.array([np.mean(group, axis=0) for i, group in np.ndenumerate(volume)]).reshape(???)

第一行的问题已经是它没有区分日期，所以1417到1440在一个正常的年份里对应的是3月1日，而在闰年里对应的是2月29日。
如果闰年的区别使它变得更加复杂，那么它并不那么重要，可以忽略。

numpy

来源：https://stackoverflow.com/questions/75834409/array-averages-for-hour-of-the-date-in-year-in-python

1条答案

按热度按时间

w7t8yxp51#

假设你使用的是pd.DatetimeIndex，你可能会发现pandas操作在这种情况下比只使用numpy更有用。下面是一个尝试：

import numpy as np
import pandas as pd

volume = np.random.rand(1, 87648, 100)
index = pd.date_range("2013-01-01", "2023-01-01", freq="H", inclusive="left")

df = pd.DataFrame(
    volume.squeeze(), # Squeeze to temporarily get rid of the leading single dimension
    index=index
)

out = df.groupby(df.index.strftime("%m-%d %H")).mean()

在这里，我使用pd.DatetimeIndex.strftime作为唯一标识在取平均值时要分组在一起的行的方法，但您也可以使用[df.index.month, df.index.day, df.index.hour]
输出如下所示：

0         1         2         3         4   ...        95        96        97        98        99
01-01 00  0.352494  0.616882  0.475246  0.543492  0.482271  ...  0.431965  0.292609  0.593101  0.465737  0.515728
01-01 01  0.602057  0.503248  0.496831  0.561276  0.476792  ...  0.446117  0.420354  0.494491  0.433746  0.588248
01-01 02  0.574717  0.474213  0.558099  0.598167  0.512984  ...  0.511152  0.438548  0.464368  0.598788  0.478550
01-01 03  0.380682  0.680109  0.662305  0.498367  0.659267  ...  0.537061  0.617603  0.545073  0.527590  0.599664
01-01 04  0.616761  0.456948  0.700690  0.564529  0.495705  ...  0.648317  0.393420  0.479093  0.512675  0.323712
...            ...       ...       ...       ...       ...  ...       ...       ...       ...       ...       ...
12-31 19  0.373228  0.471034  0.506665  0.444749  0.460461  ...  0.558895  0.538552  0.389275  0.418527  0.508002
12-31 20  0.435194  0.454427  0.506929  0.431770  0.391848  ...  0.363227  0.558908  0.607851  0.494579  0.473551
12-31 21  0.526382  0.558862  0.560605  0.357882  0.319049  ...  0.568854  0.443583  0.421765  0.475142  0.480418
12-31 22  0.628438  0.367111  0.629999  0.501194  0.499882  ...  0.391688  0.274963  0.417083  0.433642  0.554901
12-31 23  0.511908  0.570115  0.379889  0.492934  0.572257  ...  0.538664  0.675786  0.477229  0.535941  0.518781

[8784 rows x 100 columns]

你可以把它作为一个numpy数组返回，它有一个前导的单例维度：

out = out.to_numpy()[None]

赞(0）回复(0）举报 2023-03-30

我来回答

numpy Python中“一年中日期的小时”的数组平均值

1条答案

相关问题

热门标签

最新问答