regex 如何使用awk从多条件的txt文件中提取相邻字符串？

mefy6pfw 于 2023-03-13 发布在其他

关注(0)|答案(4)|浏览(141)

我有这个txt文件

[23/10/10 14:37:44:527 EST]  DEBUG
<?xml version="1.1" encoding="UTF-8" ?>
<Request session="lkjh" id=12321>
<type>Old</type>
</Request>
[23/10/10 15:37:44:527 EST]  DEBUG
<?xml version="1.1" encoding="UTF-8" ?>
<Request session="lkjhab" id=432>
<type>New</type>
</Request>
[23/10/10 16:37:44:527 EST]  DEBUG
<?xml version="1.1" encoding="UTF-8" ?>
<Response session="lkjh" id=12321>
<type>Old</type>
</Response>

我需要使用awk来获取所有id=12321且类型为“Old”的请求和响应。我以前从未使用过awk，也找不到一种方法来获取id字符串的相邻字符串。
我获得多行代码的唯一方法是使用grep，但只有一个模式。

$ grep id=12321 file.txt -B2 -A2
[23/10/10 14:37:44:527 EST]  DEBUG
<?xml version="1.1" encoding="UTF-8" ?>
<Request session="lkjh" id=12321>
<type>Old</type>
</Request>
--
[23/10/10 16:37:44:527 EST]  DEBUG
<?xml version="1.1" encoding="UTF-8" ?>
<Response session="lkjh" id=12321>
<type>Old</type>
</Response>

但是使用grep时，我无法获得同时具有id=12321和类型“Old”的请求和响应。
也许我采取了错误的方法？任何帮助将不胜感激。

regex

来源：https://stackoverflow.com/questions/75626750/how-to-extract-adjacent-strings-from-txt-file-with-multiple-conditions-using-awk

4条答案

按热度按时间

mrfwxfqh1#

使用gnu-awk，您可以将RS变量设置为</Request>或</Response>作为记录分隔符，然后在$0中检查2个搜索词：

awk -v RS='</Re(quest|sponse)>' '/id=12321/ && /<type>Old/ {print $0 RT}' file

[23/10/10 14:37:44:527 EST]  DEBUG
<?xml version="1.1" encoding="UTF-8" ?>
<Request session="lkjh" id=12321>
<type>Old</type>
</Request>

[23/10/10 16:37:44:527 EST]  DEBUG
<?xml version="1.1" encoding="UTF-8" ?>
<Response session="lkjh" id=12321>
<type>Old</type>
</Response>

赞(0）回复(0）举报 2023-03-13

omjgkv6w2#

像这样，使用一个合适的xml解析器：xidel：

$ xidel -s --input-format=text file.txt -e '
    for $x in tokenize($raw,"\[.+\]  DEBUG")[.]
    return parse-xml($x)[./*[@id=12321 and type="Old"]]
' --output-node-format=xml --output-node-indent

Reino的功劳

输出

<Request session="lkjh" id="12321">
  <type>Old</type>
</Request>
<Response session="lkjh" id="12321">
  <type>Old</type>
</Response>

赞(0）回复(0）举报 2023-03-13

qqrboqgw3#

一种常见的解决方案是将记录分隔符RS设置为唯一标识新记录的值，以便每次迭代中的当前记录包含所有要检查的行（一个条目或相关序列;你的测试数据没有包含任何文字方括号，所以这是一个简单的演示，适用于你的样本数据：

$ awk 'BEGIN { RS="[" } NR>1 && /id=12321/ && /<type>Old<\/type>/ { print "[" $0 }' <<\:
> [23/10/10 14:37:44:527 EST]  DEBUG
> <?xml version="1.1" encoding="UTF-8" ?>
> <Request session="lkjh" id=12321>
> <type>Old</type>
> </Request>
> [23/10/10 15:37:44:527 EST]  DEBUG
> <?xml version="1.1" encoding="UTF-8" ?>
> <Request session="lkjhab" id=432>
> <type>New</type>
> </Request>
> [23/10/10 16:37:44:527 EST]  DEBUG
> <?xml version="1.1" encoding="UTF-8" ?>
> <Response session="lkjh" id=12321>
> <type>Old</type>
> </Response>
> :
[23/10/10 14:37:44:527 EST]  DEBUG
<?xml version="1.1" encoding="UTF-8" ?>
<Request session="lkjh" id=12321>
<type>Old</type>
</Request>

[23/10/10 16:37:44:527 EST]  DEBUG
<?xml version="1.1" encoding="UTF-8" ?>
<Response session="lkjh" id=12321>
<type>Old</type>
</Response>

如果您还需要在数据中容纳文字方括号，您可能会牺牲分隔符行（带有方括号和DEBUG的分隔符行），而使用将整行用作分隔符的正则表达式;但这意味着该行的内容将作为分隔符被丢弃，并且不包含在输出中（您会注意到，我上面的代码将作为分隔符被“吃掉”的[添加了回来）。

赞(0）回复(0）举报 2023-03-13

wb1gzix04#

对于所示示例的任何版本awk，请尝试以下代码。仅使用所示示例编写和测试。

awk '
/^\[[0-9]{2}\/[0-9]{2}\/[0-9]{2}/{
  if(flag2){
    print value
  }
  flag1=flag2=value=""
}
{
  value=(value?value ORS:"") $0
}
/ id=12321>/{
  flag1=1
  next
}
/<type>Old<\/type>/ && flag1{
  flag2=1
}
END{
  if(flag2){
    print value
  }
}
'   Input_file

赞(0）回复(0）举报 2023-03-13

我来回答

regex 如何使用awk从多条件的txt文件中提取相邻字符串？

4条答案

输出

相关问题

热门标签

最新问答