regex Fluentd:-解析日志时正则表达式出现问题

qzwqbdag  于 2023-01-10  发布在  其他
关注(0)|答案(2)|浏览(223)

我有这样的fluentd配置:

<source>
   @type tail
   <parse>
   @type regexp
    expression /^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] \"(?<method>\w+) (?<path>[^ ]*) (?<http>[^ ]*)" (?<status_code>[^ ]*) (?<size>[^ ]*)(?:\s"(?<referer>[^\"]*)") "(?<agent>[^\"]*)" (?<urt>[^\"]*).*/
      time_format %d/%b/%Y:%H:%M:%S %z
      keep_time_key true
      types size:integer,reqtime:float,uct:float,uht:float,urt:float
   </parse>
   path /var/log/nginx/access.log
   pos_file /tmp/fluent_nginx.pos
   tag nginx
</source>

我的日志格式:

193.137.78.17 - - [07/Jan/2023:09:21:59 +0000] "GET /net/api/employee HTTP/1.1" 200 2323 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36" 0.014
193.137.78.17 - - [07/Jan/2023:09:22:00 +0000] "GET /net/api/employee HTTP/1.1" 200 2323 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36" 0.005

我已经在regex 101上测试了我的正则表达式,没有任何问题。但是,我在fluentd上得到了一个没有模式匹配的警告。我不明白为什么日志不能正确解析。

Jan 07 09:26:26 srv-api fluentd[14878]: 2023-01-07 09:26:26 +0000 [warn]: #0 no patterns matched tag="nginx"

有人能帮帮我吗?谢谢!

sg24os4d

sg24os4d1#

我认为您的问题是日志中的前导空格

您的模式要求<remote>前面没有空格,但是在日志中,远程IP前面有4个空格。
在我看来,最简单的方法是在开头插入一个可选的变量-number-of-spaces。

^( )*(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] \"(?<method>\w+) (?<path>[^ ]*) (?<http>[^ ]*)" (?<status_code>[^ ]*) (?<size>[^ ]*)(?:\s"(?<referer>[^\"]*)") "(?<agent>[^\"]*)" (?<urt>[^\"]*).*

工作原理

()只是为了让阅读代码的人更轻松:他们将看到在他们之间有一个空格字符,否则他们可能不会注意到。
*表示其中0个或多个。
这允许匹配并丢弃行首的0个或多个空格。

顺便说一句

我注意到你有时用\来转义",有时不转义,这是有原因的吗?

vngu2lb8

vngu2lb82#

您应该直接使用nginx parser plugin
下面是使用sample input pluginnginxparser插件的完整工作示例:

    • 流利语言-nginx-测试配置**
<source>
  @type sample
  sample [
    { "message": "193.137.78.17 - - [07/Jan/2023:09:22:00 +0000] \"GET /net/api/employee HTTP/1.1\" 200 2323 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36\" 0.005" },
    { "message": "193.137.78.18 - - [07/Jan/2023:09:22:00 +0000] \"GET /net/api/employee HTTP/1.1\" 200 2323 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36\" 0.005" }
  ]
  rate 1
  size 2
  tag nginx
</source>

<filter nginx>
  @type parser
  key_name message
  <parse>
    @type nginx
  </parse>
</filter>

<match nginx>
  @type stdout
</match>
    • 快跑**
$ fluentd -c ./fluent-nginx-test.conf
    • 产出**
2023-01-07 14:22:00.000000000 +0500 nginx: {"remote":"193.137.78.17","host":"-","user":"-","method":"GET","path":"/net/api/employee","code":"200","size":"2323","referer":"-","agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36","http_x_forwarded_for":"0.005"}
2023-01-07 14:22:00.000000000 +0500 nginx: {"remote":"193.137.78.18","host":"-","user":"-","method":"GET","path":"/net/api/employee","code":"200","size":"2323","referer":"-","agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36","http_x_forwarded_for":"0.005"}

除此之外,我在regexp parser plugin中使用了正则表达式,它也工作得很好(尽管types字段中有冗余值):

    • fluent-nginx使用正则表达式配置进行测试**
<source>
  @type sample
  sample [
    { "message": "193.137.78.17 - - [07/Jan/2023:09:22:00 +0000] \"GET /net/api/employee HTTP/1.1\" 200 2323 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36\" 0.005" },
    { "message": "193.137.78.18 - - [07/Jan/2023:09:22:00 +0000] \"GET /net/api/employee HTTP/1.1\" 200 2323 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36\" 0.005" }
  ]
  rate 1
  size 2
  tag nginx
</source>

<filter nginx>
  @type parser
  key_name message
  <parse>
    @type regexp
    expression /^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] \"(?<method>\w+) (?<path>[^ ]*) (?<http>[^ ]*)" (?<status_code>[^ ]*) (?<size>[^ ]*)(?:\s"(?<referer>[^\"]*)") "(?<agent>[^\"]*)" (?<urt>[^\"]*).*/
    time_format %d/%b/%Y:%H:%M:%S %z
    keep_time_key true
    types size:integer,reqtime:float,uct:float,uht:float,urt:float
   </parse>
</filter>

<match nginx>
  @type stdout
</match>
    • 快跑**
$ fluentd -c ./fluent-nginx-test-with-regexp.conf
    • 产出**
2023-01-07 14:22:00.000000000 +0500 nginx: {"remote":"193.137.78.17","host":"-","user":"-","time":"07/Jan/2023:09:22:00 +0000","method":"GET","path":"/net/api/employee","http":"HTTP/1.1","status_code":"200","size":2323,"referer":"-","agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36","urt":0.005}
2023-01-07 14:22:00.000000000 +0500 nginx: {"remote":"193.137.78.18","host":"-","user":"-","time":"07/Jan/2023:09:22:00 +0000","method":"GET","path":"/net/api/employee","http":"HTTP/1.1","status_code":"200","size":2323,"referer":"-","agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36","urt":0.005}

但是,消息中的错误no patterns matched tag="nginx"

Jan 07 09:26:26 srv-api fluentd[14878]: 2023-01-07 09:26:26 +0000 [warn]: #0 no patterns matched tag="nginx"

这意味着在配置文件中没有对应的match节。必须有一个match节,其中包含要处理或输出的相应tag
示例:

<source>
  @type tail
  # ...
  tag nginx
</source>

# ...

<match nginx>
  @type stdout
</match>
  • 您应该专门查看config file syntax的匹配模式如何工作?部分以获得更多指导。
  • 此外,您可能希望使用vscode-fluentd扩展来突出显示VS Code的语法。
    • 环境**
  • fluentd
$ fluentd --version
fluentd 1.12.3
      • 操作系统**
$ lsb_release -d
Description:    Ubuntu 18.04.6 LTS

相关问题