使用PyParsing解析带有JSON数据的Syslog

kkbh8khc  于 2023-03-20  发布在  其他
关注(0)|答案(1)|浏览(90)

我想创建一个Syslog解析器来转换我的Syslog,其中包含key=value格式的JSON信息,输出文件为.txt,我可以将其导入FortiSIEM,FortiSIEM对兼容的syslog非常挑剔,我无法解析“原始”syslog,因此我想在日志到达SIEM之前简化日志。
我已经用PyParsing做了一些测试,但我真的不知道如何使用它,我的输出文件正在创建,但它是空白的
我想我不能共享系统日志,所以这里有一个非常粗略的例子,说明系统日志的样子:
<140>1 2022-05- 02 T08:31:22.478Z平台数据导出-系统日志_变体- {“密钥”=值,信息:{“密钥”=值,“密钥”=值,“密钥”=值},信息2:{“密钥”=值,“密钥”=值},“密钥”=值}
我想出的剧本是:

from pyparsing import Word, Suppress, alphanums, CharsNotIn, ZeroOrMore, Dict

# Define header
priority = Suppress("<") + Word(alphanums) + Suppress(">")
version = Word(alphanums) + Suppress(" ")
timestamp = CharsNotIn(" ") + Suppress(" ")
hostname = CharsNotIn(" ") + Suppress(" ")
appname = CharsNotIn(" ") + Suppress(" ")
procid = CharsNotIn(" ") + Suppress(" ")
msgid = CharsNotIn("\n")
header = priority + version + timestamp + hostname + appname + procid + msgid

# Define key-value pairs
key = Word(alphanums + "_")
value = CharsNotIn("\n")
pair = key + Suppress("=") + value
kv_pairs = Dict(pair + ZeroOrMore(Suppress(",") + pair))

# Define message format
message = header + Suppress(" ") + kv_pairs

# Open input and output files
with open("syslog.txt") as input_file, open("syslog_output.txt", "w") as output_file:
    for line in input_file:
        try:
            # Convert to key-value format
            parsed_message = message.parseString(line.strip())
            kv_message = " ".join([f"{key}={value}" for key, value in parsed_message.items()])

            # Write the message to the output file
            output_file.write(parsed_message + "\n")
        except Exception as e:
            print(f"Failed to parse line: {line} with error: {e}")

            continue

当我运行脚本并打印headermessage输出时,我遇到了2个异常:

Failed to parse line: "Whole Syslog Text"
 with error: Expected ' ', found '2022'  (at char 7), (line:1, col:8)

Failed to parse line: 
 with error: Expected '<'  (at char 0), (line:1, col:1)

Header:  {Suppress:('<') W:(0-9A-Za-z) Suppress:('>') W:(0-9A-Za-z) Suppress:(' ') !W:( ) Suppress:(' ') !W:( ) Suppress:(' ') !W:( ) Suppress:(' ') !W:( ) Suppress:(' ') !W:(
)}

Message:  {Suppress:('<') W:(0-9A-Za-z) Suppress:('>') W:(0-9A-Za-z) Suppress:(' ') !W:( ) Suppress:(' ') !W:( ) Suppress:(' ') !W:( ) Suppress:(' ') !W:( ) Suppress:(' ') !W:(
) Suppress:(' ') Dict:({W:(0-9A-Z_a-z) Suppress:('=') !W:(
) [{Suppress:(',') W:(0-9A-Z_a-z) Suppress:('=') !W:(
)}]...})}

我想让output_file看起来像这样:

<140>1 2022-05-02T08:31:22.478Z platform dataexport - syslog_variation -
key=value
key=value
key=value
...

我需要有标头,以便识别FortiSIEM上的日志类型。

bxjv4tth

bxjv4tth1#

正如我在评论中提到的,pyparser默认情况下跳过空格,因此应该删除所有的+ Suppress(" ")术语。
CharsNotIn是空白跳过规则的例外,我发现Word(printables)工作得更好。
我将您的timestamphostname等术语替换为Word(printables),如下所示:

timestamp = Word(printables)
hostname = Word(printables)
appname = Word(printables)
procid = Word(printables)
msgid = rest_of_line
header = priority + version + timestamp + hostname + appname + '-' + procid + '-' + msgid

我使用以下代码来测试解析器:

header.run_tests("""\
    <140>1 2022-05-02T08:31:22.478Z platform dataexport - syslog_variation - {"key"=value, info:{"key"=value, "key"=value, "key"=value}, info2:{"key"=value, "key"=value},"key"=value}
    """)

得到了这个

<140>1 2022-05-02T08:31:22.478Z platform dataexport - syslog_variation - {"key"=value, info:{"key"=value, "key"=value, "key"=value}, info2:{"key"=value, "key"=value},"key"=value}
['140', '1', '2022-05-02T08:31:22.478Z', 'platform', 'dataexport', '-', 'syslog_variation', '-', ' {"key"=value, info:{"key"=value, "key"=value, "key"=value}, info2:{"key"=value, "key"=value},"key"=value}']

你必须细化键值对的定义,对于key,使用pyparser的QuotedString('"'),因为它是用引号括起来的值,对于value,你需要更小心地读到下一个逗号或},而不是一直读到行末的\n

相关问题