elasticsearch 使用ruby对嵌套对象进行xml过滤

xkrw2x1b  于 2022-12-03  发布在  ElasticSearch
关注(0)|答案(1)|浏览(130)

我有下面的xml格式的日志文件

<QuerySiteInformation>
    xmlns="http://www.example.com"
    <Site>
        <id>abc-cde-fvvvv</id>
        <Item>
            <id>e5753ead-d202-451e-92cc-ea49d0a6bdf5</id>
            <code>67448833344443</code>
            <objectMessage>Internal> message shown here in multiple lines</objectMessage>
            <reference>/</reference>
        </Item>
    </Site>
    <SiteInteraction>
        <InteractionItem>
            <Location>
                <id>8496940--2842047577555</id>
                <objectMessage>Internal> message shown here in multiple lines</objectMessage>
            </Location>
        </InteractionItem>
    </SiteInteraction>
</QuerySiteInformation>

我只想在<objectMessage>标记位于<Item>标记内时将xml标记<objectMessage>message in multiples lines</objectMessage>变更为<objectMessage>MESSAGE HAS BEEN REMOVED</objectMessage>
我有下面的配置部分,它可以查看并将xml转换为我想要的消息

<objectMessage>Internal> message shown here in multiple lines</objectMessage>

组态

filter {
 mutate {
  gsub => [
    "some regex pattern can do the xml tag filtering", "MESSAGE HAS BEEN REMOVED"

   ]
 }
}

但是,这将更改所有<objectMessage> message shown here in multiple lines</objectMessage>,包括<Item>字段之外的<objectMessage> message shown here in multiple lines</objectMessage>
我知道使用ruby插件可以做得更好,不应该使用regex进行xml解析。但这是目前为止我能找到的最接近的插件。

cygmwpex

cygmwpex1#

理想情况下,您希望使用内置的xml过滤器插件,它是更可靠和可维护的方式:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-xml.html
以下conf文件将解析XML并替换内部对象的值:

input {
    generator {
        lines => [
        '<QuerySiteInformation>
            xmlns="http://www.example.com"
            <Site>
            <id>abc-cde-fvvvv</id>
            <Item>
            <id>e5753ead-d202-451e-92cc-ea49d0a6bdf5</id>
            <code>67448833344443</code>
            <objectMessage>Internal> message shown here in multiple lines</objectMessage>
            <reference>/</reference>
            </Item>
            <Item>
            <id>e5753ead-d202-451e-92cc-ea49d0a6bdf5</id>
            <code>67448833344443</code>
            <objectMessage>Internal> message shown here in multiple lines</objectMessage>
            <reference>/</reference>
            </Item>
            </Site>
            <SiteInteraction>
            <InteractionItem>
            <Location>
                <id>8496940--2842047577555</id>
                <objectMessage>Internal> message shown here in multiple lines</objectMessage>
            </Location>
            </InteractionItem>
            </SiteInteraction>
        </QuerySiteInformation>'
        ]
        count => 1
    }
}

filter {
    xml {
        source => "message"
        target => "xml"
        store_xml => true
        remove_field => ["message"]
    }
}

filter {
  ruby {
    code => '
      event.get("[xml][Site][0][Item]").each_with_index do |item, index|
        event.set("[xml][Site][0][Item][#{index}]", "REMOVED MESSAGE")
      end 
    '
  }
}

output {
    stdout {
        codec => rubydebug
    }
}

输出量:

{
          "host" => {
        "name" => "Mac-Studio.local"
    },
      "@version" => "1",
    "@timestamp" => 2022-11-28T13:47:31.352282Z,
         "event" => {
        "original" => "<QuerySiteInformation>\n            xmlns=\"http://www.example.com\"\n            <Site>\n            <id>abc-cde-fvvvv</id>\n            <Item>\n            <id>e5753ead-d202-451e-92cc-ea49d0a6bdf5</id>\n            <code>67448833344443</code>\n            <objectMessage>Internal> message shown here in multiple lines</objectMessage>\n            <reference>/</reference>\n            </Item>\n            <Item>\n            <id>e5753ead-d202-451e-92cc-ea49d0a6bdf5</id>\n            <code>67448833344443</code>\n            <objectMessage>Internal> message shown here in multiple lines</objectMessage>\n            <reference>/</reference>\n            </Item>\n            </Site>\n            <SiteInteraction>\n            <InteractionItem>\n            <Location>\n                <id>8496940--2842047577555</id>\n                <objectMessage>Internal> message shown here in multiple lines</objectMessage>\n            </Location>\n            </InteractionItem>\n            </SiteInteraction>\n        </QuerySiteInformation>",
        "sequence" => 0
    },
           "xml" => {
                "content" => [
            [0] "\n            xmlns=\"http://www.example.com\"\n            ",
            [1] "\n            ",
            [2] "\n        "
        ],
                   "Site" => [
            [0] {
                  "id" => [
                    [0] "abc-cde-fvvvv"
                ],
                "Item" => [
                    [0] "REMOVED MESSAGE",
                    [1] "REMOVED MESSAGE"
                ]
            }
        ],
        "SiteInteraction" => [
            [0] {
                "InteractionItem" => [
                    [0] {
                        "Location" => [
                            [0] {
                                           "id" => [
                                    [0] "8496940--2842047577555"
                                ],
                                "objectMessage" => [
                                    [0] "Internal> message shown here in multiple lines"
                                ]
                            }
                        ]
                    }
                ]
            }
        ]
    }
}

相关问题