pandas 在Python中使用时间序列数据来计算平均值、方差标准偏差

cnh2zyt3  于 2023-04-28  发布在  Python
关注(0)|答案(2)|浏览(146)

我有从传感器收集的数据,看起来像:

sec   nanosec value 

1001   1       0.2 

1001   2       0.2

1001   3       0.2 

1002   1       0.1  

1002   2       0.2   

1002   3       0.1 

1003   1       0.2 

1003   2       0.2

1003   3       0.1  

1004   1       0.2   

1004   2       0.2 

1004   3       0.2 

1004   4      0.1

我想计算average,std deviation和其他一些统计数据,例如每2秒计算一列的最大值和最小值。因此,(1001,1002)的平均值= 0.167,(1003,1004)的平均值=0.17
从教程http://earthpy.org/pandas-basics.html中,我认为我应该将其转换为时间序列,并使用pandas中的rolling _means,但是我对时间序列数据不熟悉,所以我不确定这是否是正确的方法。2还有,我如何在这里指定频率进行转换,因为第一秒的观测值较少。因此,对于实际数据,我在1001秒内有不到100个读数,然后在1002秒内有100个观测值。
我也可以在秒上做一个简单的groupby,但它会每秒分组读数,而不是每2秒,那么我怎么能从groupby中合并2个连续组的观察结果,然后进行分析。

lztngnrs

lztngnrs1#

我想你可以先用2 seconds2S)转换列secto_timedeltaset_indexresample

df['sec'] = pd.to_timedelta(df.sec, unit='s')
df.set_index('sec', inplace=True)
print (df)
          nanosec  value
sec                     
00:16:41        1    0.2
00:16:41        2    0.2
00:16:41        3    0.2
00:16:42        1    0.1
00:16:42        2    0.2
00:16:42        3    0.1
00:16:43        1    0.2
00:16:43        2    0.2
00:16:43        3    0.1
00:16:44        1    0.2
00:16:44        2    0.2
00:16:44        3    0.2
00:16:44        4    0.1
print (df.value.resample('2S').mean())
sec
00:16:41    0.166667
00:16:43    0.171429
00:16:45         NaN
Freq: 2S, Name: value, dtype: float64

print (df.value.resample('2S').std())
sec
00:16:41    0.051640
00:16:43    0.048795
00:16:45         NaN
Freq: 2S, Name: value, dtype: float64

print (df.value.resample('2S').max())
sec
00:16:41    0.2
00:16:43    0.2
00:16:45    NaN
Freq: 2S, Name: value, dtype: float64

也许你需要在resample中更改base

print (df.value.resample('2S', base=1).mean())
sec
00:16:42    0.166667
00:16:44    0.171429
00:16:46         NaN
Freq: 2S, Name: value, dtype: float64

print (df.value.resample('2S', base=1).std())
sec
00:16:42    0.051640
00:16:44    0.048795
00:16:46         NaN
Freq: 2S, Name: value, dtype: float64

print (df.value.resample('2S', base=1).max())
sec
00:16:42    0.2
00:16:44    0.2
00:16:46    NaN
Freq: 2S, Name: value, dtype: float64
print (df.value.resample('2S', base=2).mean())
sec
00:16:43    0.166667
00:16:45    0.171429
00:16:47         NaN
Freq: 2S, Name: value, dtype: float64

print (df.value.resample('2S', base=2).std())
sec
00:16:43    0.051640
00:16:45    0.048795
00:16:47         NaN
Freq: 2S, Name: value, dtype: float64

print (df.value.resample('2S', base=2).max())
sec
00:16:43    0.2
00:16:45    0.2
00:16:47    NaN
Freq: 2S, Name: value, dtype: float64
g9icjywg

g9icjywg2#

借用jezrael的代码来设置:

df['sec'] = pd.to_timedelta(df.sec, unit='s')
df.set_index('sec', inplace=True)
print (df)
          nanosec  value
sec                     
00:16:41        1    0.2
00:16:41        2    0.2
00:16:41        3    0.2
00:16:42        1    0.1
00:16:42        2    0.2
00:16:42        3    0.1
00:16:43        1    0.2
00:16:43        2    0.2
00:16:43        3    0.1
00:16:44        1    0.2
00:16:44        2    0.2
00:16:44        3    0.2
00:16:44        4    0.1

使用pd.TimeGrouper('2S')describe()

df.groupby(pd.TimeGrouper('2S')).describe()

相关问题