csv 如何对预定义时间范围内的事件进行计数

1qczuiv0  于 2022-12-15  发布在  其他
关注(0)|答案(1)|浏览(129)

我想统计csv数据文件每1秒的事件数,并根据结果绘制直方图。但我不明白如何获得每秒的事件数。有人能帮我解决这个问题吗?
代码为:
从matplotlib导入pyplot作为pl导入Pandas作为pd导入numpy作为np

def read_data():
    df = pd.read_csv("test.csv", usecols=['time', 'unix_time', 'name'])
    df['time'] = pd.to_datetime(df['time'])
    df['unix_time'] = (df['unix_time']).astype(int)
    df.info()

    i = 1

    time_counts = df.groupby((3600 * df.time.dt.minute + df.time.dt.second) // i * i)['time'].count()
    print(time_counts)

if __name__ == "__main__":
    read_data()

输出看起来很奇怪:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33 entries, 0 to 32
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   time       33 non-null     datetime64[ns]
 1   unix_time  33 non-null     int32         
 2   name       33 non-null     object        
dtypes: datetime64[ns](1), int32(1), object(1)
memory usage: 788.0+ bytes

time
18        1
25217     1
43209     1
43219     1
46804     1
54047     1
61241     1
64815     1
64833     1
68402     1
75620     1
79235     1
82806     1
82837     2
86407     1
86446     1
93625     1
97254     1
104446    1
140438    1
144050    1
162025    1
169250    1
180050    1
183623    1
183658    1
194404    1
194412    2
194433    1
194438    1
205219    1
Name: time, dtype: int64

csv格式的数据为:

time                    unix_time       name
2022-12-15 08:00:18.034 1671091218034   apple
2022-12-15 08:07:17.376 1671091637376   apple
2022-12-15 08:12:09.648 1671091929648   apple
2022-12-15 08:12:19.320 1671091939320   apple
2022-12-15 08:13:04.623 1671091984623   apple
2022-12-15 08:15:47.103 1671092147103   apple
2022-12-15 08:17:41.878 1671092261878   apple
2022-12-15 08:18:15.842 1671092295842   apple
2022-12-15 08:18:33.786 1671092313786   apple
2022-12-15 08:19:02.022 1671092342022   apple
2022-12-15 08:21:20.350 1671092480350   apple
2022-12-15 08:22:35.603 1671092555603   apple
2022-12-15 08:23:06.009 1671092586009   apple
2022-12-15 08:23:37.101 1671092617101   apple
2022-12-15 08:23:37.334 1671092617334   apple
2022-12-15 08:24:07.645 1671092647645   apple
2022-12-15 08:24:46.978 1671092686978   apple
2022-12-15 08:26:25.430 1671092785430   apple
2022-12-15 08:27:54.027 1671092874027   apple
2022-12-15 08:29:46.712 1671092986712   apple
2022-12-15 08:39:38.742 1671093578742   apple
2022-12-15 08:40:50.310 1671093650310   apple
2022-12-15 08:45:25.007 1671093925007   apple
2022-12-15 08:47:50.770 1671094070770   apple
2022-12-15 08:50:50.856 1671094250856   apple
2022-12-15 08:51:23.914 1671094283914   apple
2022-12-15 08:51:58.572 1671094318572   apple
2022-12-15 08:54:04.959 1671094444959   apple
2022-12-15 08:54:12.424 1671094452424   apple
2022-12-15 08:54:12.807 1671094452807   apple
2022-12-15 08:54:33.562 1671094473562   apple
2022-12-15 08:54:38.531 1671094478531   apple
2022-12-15 08:57:19.777 1671094639777   apple
9cbw7uwe

9cbw7uwe1#

以1秒的频率使用Grouper

df['time'] = pd.to_datetime(df['time'])

time_counts = df.groupby(pd.Grouper(freq='1s', key='time'))['time'].count()
print(time_counts)
time
2022-12-15 08:00:18    1
2022-12-15 08:00:19    0
2022-12-15 08:00:20    0
2022-12-15 08:00:21    0
2022-12-15 08:00:22    0
                      ..
2022-12-15 08:57:15    0
2022-12-15 08:57:16    0
2022-12-15 08:57:17    0
2022-12-15 08:57:18    0
2022-12-15 08:57:19    1
Freq: S, Name: time, Length: 3422, dtype: int64

Series.dt.floor表示删除毫秒:
x一个一个一个一个x一个一个二个x

相关问题