perl 使用正则表达式提取两个块之间的文本

wgmfuz8q  于 2022-11-24  发布在  Perl
关注(0)|答案(4)|浏览(173)

我尝试使用下面的正则表达式提取两个字符串之间的文本。

(?s)Non-terminated Pods:.*?in total.\R(.*)(?=Allocated resources)

这个正则表达式在regex 101中看起来很好,但是当与perlgrep -P一起使用时,不知何故没有打印pod细节。

kubectl describe  node |perl -le '/(?s)Non-terminated Pods:.*?in total.\R(.*)(?=Allocated resources)/m; printf "$1"'

下面是示例输入:

PodCIDRs:                     10.233.65.0/24
Non-terminated Pods:          (7 in total)
  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo                                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-1                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp8                          100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    7m4s
Allocated resources:

问题

1.如何从上面的输出中提取信息,看起来像下面的样子。我正在使用的正则表达式或命令有什么问题?

Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo                                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-1                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp8                          100m (5%)     0 (0%)      70Mi (4%)

问题-2:如果我有两个类似的输入块,该怎么办?如何提取pod的详细信息?例如:

如果输入为:

PodCIDRs:                     10.233.65.0/24
Non-terminated Pods:          (7 in total)
  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo                                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-1                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp8                          100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    7m4s
Allocated resources:
....some
.......random data...
PodCIDRs:                     10.233.65.0/24
Non-terminated Pods:          (7 in total)
  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo-1                                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-2                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp3-2                          100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    7m4s
Allocated resources:
hk8txs48

hk8txs481#

带着一些明显的假设,并保持它接近问题中的模式:

perl -0777 -wnE'
    @pods = /Non-terminated\s+Pods:\s+\([0-9]+\s+in\s+total\)\n(.*?)\nAllocated resources:/gs;
    say for @pods
' input-file

(note修饰符,该行太宽,屏幕上放不下:(一个月一个月)
问题中的正则表达式代替了答案中的正则表达式(并且没有/s修饰符,这是应该的)用于单个文本块。要处理多个文本块,其中的(.*)需要改为(.*?),这样就不会一直匹配到最后一个Allocated...
这个问题没有说明正则表达式"* 与perl * 一起使用"的精确程度;我不能说什么失败了。
对上述命令行程序的注解:

  • -0777开关使它将整个文件读入一个字符串,该字符串在程序的变量$_中可用,正则表达式默认绑定到该变量

还有开关-g,它是-0777的别名,从5.36.0开始可用

  • 我们仍然需要-n开关,这样程序就可以遍历输入的"行"(STDIN或一个文件)。
  • 由于match运算符位于列表上下文中,并被分配给数组@pods,因此将返回正则表达式捕获
6jjcrrmo

6jjcrrmo2#

使用gnu-grep,您可以对正则表达式进行一些调整:

kubectl describe  node |
grep -zoP '(?s)Non-terminated Pods:.*?in total.\R\K(.*?)(?=Allocated resources)'

  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo                                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-1                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp8                          100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    7m4s
  • \R之后使用\K(匹配复位)从输出中删除该行
  • 使用-z选项将输入和输出数据视为行序列,每行以零字节(ASCII NUL字符)而不是换行符终止。

PS:相同的正则表达式将与第二个输入块以及标题行显示在每个块之前。

**或者,**您也可以为此作业使用任何版本的sed

kubectl describe  node |
sed -n '/Non-terminated Pods:.*in total.*/,/Allocated resources:/ {//!p;}'

  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo                                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-1                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp8                          100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    7m4s
anauzrmj

anauzrmj3#

使用您展示示例,请尝试以下GNU awk代码在GNU awk中编写和测试简单解释是:将输入文件的RS设置为Non-terminated Pods:.*Allocated resources:。然后在主程序中检查RT是否不为空,然后使用awkgsub函数将(^|\n)Non-terminated Pods:[^\n]*\n\nAllocated resources:\n*替换为RT中的空然后打印其值,该值将根据所示示例提供输出。

awk -v RS='Non-terminated Pods:.*Allocated resources:' '
RT{
  gsub(/(^|\n)Non-terminated Pods:[^\n]*\n|\nAllocated resources:\n*/,"",RT)
  print RT
}
'  Input_file
j2datikz

j2datikz4#

对于一个非常大的文件,一个可能的解决方案可以是如下逐行读取。
选择感兴趣的行范围,并删除未包含在所需输出中的最后一行。

use strict;
use warnings;

while(<>) {
    if( /^  Namespace/ .. /^Allocated resources:/ ) {
        print unless /^Allocated resources:/;
    }
}

exit 0;

输出量

Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo                                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-1                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp8                          100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    7m4s
  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo-1                                       0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-2                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp3-2                        100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    7m4s

相关问题