将csv记录到JSON或任何机制以获取指标

6jygbczu  于 2023-03-10  发布在  其他
关注(0)|答案(1)|浏览(109)

我有一个日志文件,我需要提取统计数据的基础上捕获的日志。
日志格式示例:

2023-03-09 18:55:56,INFO,capturing: DataIngestion
2023-03-09 18:55:57,INFO,waiting to get data
2023-03-09 18:56:58,INFO,time started,2023-03-09 18:56:57
2023-03-09 18:56:59,INFO,data received processing started
2023-03-09 18:56:59,INFO,data to convert, 23000
2023-03-09 18:56:00,INFO,covert json,2023-03-09 18:56:57
2023-03-09 18:57:00,INFO,convesion time,0 days 00:01:00
2023-03-09 18:57:58,INFO,process data,2023-03-09 18:57:58
2023-03-09 18:59:01,INFO,process completed,0 days 00:02:03
2023-03-09 18:59:10,INFO,time taken,0 days 00:00:09
2023-03-09 18:59:02,INFO,waiting to get data
2023-03-09 18:59:03,INFO,time started,2023-03-09 18:59:03
2023-03-09 18:59:59,INFO,data received processing started
2023-03-09 18:59:59,INFO,data to convert,30000
2023-03-09 19:00:01,INFO,covert json,2023-03-09 19:00:01
2023-03-09 19:01:31,INFO,convesion time,0 days 00:01:30
2023-03-09 19:01:32,INFO,process data,2023-03-09 19:01:32
2023-03-09 19:04:30,INFO,process completed,0 days 00:03:28
2023-03-09 19:04:31,INFO,time taken,0 days 00:03:31

一次迭代应从“开始处理接收的数据”到“花费的时间”
我需要
1.总数据、转换时间和完成处理时间以及总耗时。
1.最小值数据、最大值数据、平均值数据---最小值数据、最大值数据和平均值数据各自所花费转换时间(假设有10次迭代--在该10次数据迭代中的最小值数据、在该10次数据迭代中的最大值数据和10次迭代的平均值以及它们各自所花费的转换、处理和时间)
我开始转换csv到JSON第一,并认为从那里提取,但我坚持进一步。这里是我累了,

import csv, json

csvfile = open('test1.log', 'r')
jsonfile = open('file.json', 'w')

fieldnames = ("logtime","log","type","time stats")
reader = csv.DictReader( csvfile, fieldnames)
for row in reader:
    json.dump(row, jsonfile)
    jsonfile.write('\n')

请建议我如何进一步获得统计数据,或者有没有更好的方法从我提供的日志中获得统计数据

wnavrhmk

wnavrhmk1#

你可以使用panda来读取你的csv,并旋转对应于条目的段,然后进行concat:

import pandas as pd

df = pd.read_csv('log.csv', names=['time', 'type', 'event', 'value']).drop(columns='time')

entry_start = df[df['event']=="data received processing started"].index
entry_end = df[df['event']=="time taken"].index + 1

df = pd.concat([
        df[s:e].pivot(index='type', columns='event', values='value')
        for s, e in zip(entry_start, entry_end)]
    ).drop(columns="data received processing started")

这将为您提供以下格式的 Dataframe :

event   convesion time          covert json data to convert process completed         process data       time taken
type                                                                                                               
INFO   0 days 00:01:00  2023-03-09 18:56:57           23000   0 days 00:02:03  2023-03-09 18:57:58  0 days 00:00:09
INFO   0 days 00:01:30  2023-03-09 19:00:01           30000   0 days 00:03:28  2023-03-09 19:01:32  0 days 00:03:31

然后,您可以轻松访问最小值、最大值和平均值:

# make sure the column is of type int
df['data to convert'] = df['data to convert'].astype(int)
data_min = df['data to convert'].min()
data_max = df['data to convert'].max()
data_avg = df['data to convert'].mean()

和相应的转换时间:

df.loc[df['data to convert']==data_min, 'convesion time']

编辑:将time taken添加到输出中,可以对所有列元素求和:

# converting column to timedelta
df['time taken'] = pd.to_timedelta(df['time taken'])
df['time taken'].sum()

相关问题