regex 正则表达式使用“/”字符而不是换行符进行分隔

bvjxkvbb 于 2023-05-08 发布在其他

关注(0)|答案(1)|浏览(210)

我试着为Splunk搜索做一个RegEx，应该从URL中提取TLD。来源是Panorama Logs。
RegEx：^(?:https?:\/\/)?(?<host>[^\/]+)?(?<tld>\.[^.?\/\n]+).*$
测试数据：

https://example.org/
qq.com
https://border.example.com/?bridge=basket&blood=animal
360.cn
http://example.com/?brother=bike
smugmug.com
shop-pro.jp

RegEx和testdata在Regex101.com上;我使用www.example.com生成测试数据randomlists.com，以匿名化源数据。capture-group是必需的;只是为了可读性。
描述一下你尝试了什么
从一组URL中匹配TLD;一些具有先前协议，一些没有。输入记录应该用换行符分隔，匹配项的长度不应该超过一条记录。
你期望发生的事
所有TLD都匹配并且在捕获组中。
以及实际结果
以/结尾的行可以工作，但是没有/的行不行。

regex

来源：https://stackoverflow.com/questions/76162456/regex-breaks-with-character-instead-of-newline

1条答案

按热度按时间

42fyovps1#

不使用rex，可以使用eval和mvexpand完成所有这些操作
一个随处运行的例子：

| makeresults
| eval urls="https://www.example.org/|http://example.com/|ca.gov|http://blade.example.com/bikes/airplane.php|http://alarm.example.com/|smugmug.com|shop-pro.jp|https://example.org/|qq.com|pcworld.com|symantec.com|360.cn|http://example.com/?brother=bike|http://www.example.com/behavior/bead.php|army.mil|https://example.com/boy/bedroom.php|https://example.com/|https://www.example.com/brother?activity=believe|https://www.example.net/achiever/bottle.html|http://believe.example.com/bit?bait=base&bone=ball|aboutads.info|http://www.example.com/|http://www.example.edu/afternoon|livejournal.com|http://border.example.com/box/afterthought|oaic.gov.au|https://www.example.edu/base.php|house.gov|smh.com.au|http://www.example.edu/|https://www.example.org/|lycos.com|https://border.example.com/?bridge=basket&blood=animal|hibu.com|http://example.com/"
| eval urls=split(urls,"|")
| mvexpand urls
| eval busted=split(urls,":")
| eval busted=mvindex(trim(split(trim(replace(mvfilter(match(busted,"\.")),"\/"," "))," ")),0)

我将最后几个步骤合并成一行，但这就是它正在做的：

根据管道（“|“）字符断开URL列表
mvexpand多值字段
split:字符上的每个URL（如果不存在，则split没有任何内容
在mvfilter中选择以下match艾德split的第0个（第一个）元素：
所有具有周期（“.“）的内容
将斜杠（“/“）替换为空格（“``“），并且
在空格（“``“）上拆分

您所需的fqdn现在位于busted中
提取TLD现在是微不足道的。添加以下内容：

| rex field=busted "(?<tld>[0-9a-zA-Z][0-9a-zA-Z_\-]+?\.[0-9a-zA-Z]+)$"

或者，为了只使用eval，完全跳过rex，请执行以下操作：

| eval tld=mvindex(split(busted,"."),-2) +"."+ mvindex(split(busted,"."),-1)

赞(0）回复(0）举报 2023-05-08

我来回答

regex 正则表达式使用“/”字符而不是换行符进行分隔

1条答案

相关问题

热门标签

最新问答