csv 阅读日志文件的文件夹，并计算唯一ID的事件持续时间

我有一个气隙系统（软件访问非常有限），每天生成使用日志。日志中有我过去设法抓取并抽取到CSV中的设备的唯一ID，然后我会在LibreCalc（与我在这里提出的问题相关-https://superuser.com/questions/1732415/find-next-matching-event-in-log-and-compare-timings）中清理这些设备，并获得每个设备的事件持续时间。
随着更多设备的添加，这变得越来越困难，所以我希望自动计算每个设备的总持续时间，以及该设备发生了多少事件。我已经得到了一些使用/awk/sed的建议，但我对如何实现它有点困惑。

Log Example
message="device02 connected" event_ts=2023-01-10T09:20:21Z
message="device05 connected" event_ts=2023-01-10T09:21:31Z
message="device02 disconnected" event_ts=2023-01-10T09:21:56Z
message="device04 connected" event_ts=2023-01-10T11:12:28Z
message="device05 disconnected" event_ts=2023-01-10T15:26:36Z
message="device04 disconnected" event_ts=2023-01-10T18:23:32Z

我已经有了一个bash脚本，它从文件夹中的日志文件中抓取这些事件，然后将其全部输出到csv文件中。

#/bin/bash
#Just a datetime stamp for the flatfile
now=$(date +”%Y%m%d”)
#Log file path, also where I define what month to scrape
LOGFILE=’local.log-202301*’
#Shows what log files are getting read
echo $LOGFILE \n
#Output line by line to csv
awk ‘(/connect/ && ORS=”\n”) || (/disconnect/ && ORS=RS) {field1_var=$1” “$2” “$3”,”; print field1_var}’ $LOGFILE > /home/user/logs/LOG_$now.csv

理想情况下，我希望保留该过程，以便在必要时手动检查文件，但最终我更希望自动化事件计算，以生成如下所示的内容：

Desired Output Example
Device         Total Connection Duration     Total Connections
device01       0h 0m 0s                      0
device02       0h 1m 35s                     1
device03       0h 0m 0s                      0
device04       7h 11m 4s                     1
device05       6h 5m 5s                      1

希望这是足够的信息，任何帮助或指针将不胜感激。谢谢。

这完全不是基于您的脚本，因为我没有让它生成CSV，但无论如何...
下面是一个AWK脚本，它为给定的示例日志文件计算所需结果：

function time_lapsed(from, to) {
  gsub(/[^0-9 ]/, " ", from);
  gsub(/[^0-9 ]/, " ", to);
  return mktime(to) - mktime(from);
}
BEGIN { OFS = "\t"; }
(/ connected/) {
  split($1, a, "=\"", _);
  split($3, b, "=", _);
  device_connected_at[a[2]] = b[2];
  device_connection_count[a[2]]++;
}
(/disconnected/) {
  split($1, a, "=\"", _);
  split($3, b, "=", _); 
  device_connection_duration[a[2]]+=time_lapsed(device_connected_at[a[2]], b[2]);
}
END {
  print "Device","Total Connection Duration", "Total Connections";
  for (device in device_connection_duration) {
    print device, strftime("%Hh %Mm %Ss", device_connection_duration[device]), device_connection_count[device];
  };
}

我在这个示例日志文件中使用了它

message="device02 connected" event_ts=2023-01-10T09:20:21Z
message="device05 connected" event_ts=2023-01-10T09:21:31Z
message="device02 disconnected" event_ts=2023-01-10T09:21:56Z
message="device04 connected" event_ts=2023-01-10T11:12:28Z
message="device06 connected" event_ts=2023-01-10T11:12:28Z
message="device05 disconnected" event_ts=2023-01-10T15:26:36Z
message="device02 connected" event_ts=2023-01-10T19:20:21Z
message="device04 disconnected" event_ts=2023-01-10T18:23:32Z
message="device02 disconnected" event_ts=2023-01-10T21:41:33Z

它产生这样的输出

Device  Total Connection Duration   Total Connections
device02    03h 22m 47s 2
device04    08h 11m 04s 1
device05    07h 05m 05s 1

你可以把这个程序传递给awk而不使用任何标志，它应该可以正常工作（假设你没有在shell会话中乱用字段和记录分隔符）。
让我解释一下这是怎么回事：首先我们定义了time_lapsed函数。在该函数中，我们首先将ISO8601时间戳转换为mktime可以处理的格式（YYYY MM DD HH MM SS），我们只需要删除偏移量，因为它完全是UTC。然后，我们计算mktime返回的Epoch时间戳的差值，并返回该结果。
接下来，在BEGIN块中，我们将输出字段分隔符OFS定义为制表符。然后我们定义两个规则，一个用于设备连接时的日志行，另一个用于设备断开连接时的日志行。由于默认字段分隔符，这些规则的输入如下所示：

$1: message="device02
$2: connected"
$3: event_ts=2023-01-10T09:20:21Z

我们不关心$2，我们使用split分别从$1和$3获取设备标识符和时间戳。
在设备连接的规则中，使用设备标识符作为密钥，然后我们存储设备连接的时间，并增加该设备的连接计数。我们不需要初始分配0，因为awk中的关联数组返回""，用于不包含记录的字段，通过增加它强制为0。
在设备断开连接的规则中，我们计算经过的时间，并将其添加到该设备经过的总时间中。

**请注意，这要求每个连接在日志中都有一个匹配的断开连接。**也就是说，这是非常脆弱的，丢失的连接日志行将扰乱总连接时间的计算。丢失的断开连接日志行会增加连接计数，但不会增加总连接时间。

在END规则中，我们打印所需的输出头，并为关联数组device_connection_duration中的每个条目打印设备标识符、总连接持续时间和总连接计数。
我希望这能给你一些关于如何解决你的任务的想法。

csv 阅读日志文件的文件夹，并计算唯一ID的事件持续时间

1条答案

相关问题

热门标签

最新问答