linux 匹配模式与提取

xtfmy6hx  于 2023-08-03  发布在  Linux
关注(0)|答案(5)|浏览(116)

我有一个日志文件,看起来像下面。它显示有许多文件丢失。我想列出它。

$cat datafile.txt
    /data/kay/20091012.csv
    gawk: cmd. line:4: (FILENAME=/data/kay/out/501_20091013.txt FNR=1) fatal: file not file
    /data/kay/20091014.csv
    gawk: cmd. line:4: (FILENAME=/data/kay/out/501_20091015.txt FNR=1) fatal: file not file
    gawk: cmd. line:4: (FILENAME=/data/kay/out/501_20091016.txt FNR=1) fatal: file not file
    /data/kay/20091017.csv
    gawk: cmd. line:4: (FILENAME=/data/kay/out/501_20091018.txt FNR=1) fatal: file not file

字符串
我想列出文件丢失的日期。我的脚本在下面:

$script.sh
awk '{if($1 -eq "gawk")print $4}' datafile.txt
echo ${echo $(awk '{if($1 -eq "gawk")print $4}' datafile.txt):-14,8}


显示错误。“替换错误”
我的愿望输出:

$outfile.txt
20091013
20091015
20091016
20091018

6ovsh4lw

6ovsh4lw1#

使用sed

$ sed -En '/^ +gawk/s/[^)]*_([^.]*).*/\1/woutput.txt' input_file
$ cat output.txt
20091013
20091015
20091016
20091018

字符串
使用awk

$ awk -F"[_.]" '/gawk/{print $3 > "output.txt" }' input_file
$ cat output.txt
20091013
20091015
20091016
20091018

m3eecexj

m3eecexj2#

使用grep

grep -oE '[0-9]{4}[0-9]{1,2}[0-9]{1,2}' input_file

字符串

pexxcrt2

pexxcrt23#

使用GNU grep和您显示的示例,请尝试以下解决方案。在grep的regex中使用regex和lazy match概念。

grep -oP '^[[:space:]]+gawk:.*?out/[0-9]+_\K(.*?)(\.txt)'  Input_file

字符串

ajsxfq5m

ajsxfq5m4#

datafile.txt内容为

/data/kay/20091012.csv
gawk: cmd. line:4: (FILENAME=/data/kay/out/501_20091013.txt FNR=1) fatal: file not file
/data/kay/20091014.csv
gawk: cmd. line:4: (FILENAME=/data/kay/out/501_20091015.txt FNR=1) fatal: file not file
gawk: cmd. line:4: (FILENAME=/data/kay/out/501_20091016.txt FNR=1) fatal: file not file
/data/kay/20091017.csv
gawk: cmd. line:4: (FILENAME=/data/kay/out/501_20091018.txt FNR=1) fatal: file not file

字符串
通过做

awk '{if($1 -eq "gawk")print $4}' datafile.txt


您指示您的awk从第一个字段推导出eq,然后将其与gawk连接,因为所有这些字段都是非数字的,并且eq未设置0,对于计算为

0gawk


这是考虑在布尔上下文中的真实性,所以它将为每行print第4个字段(如果没有这样的字段,则为空字符串)。
如果需要比较,应该使用==,并考虑到您正在查找第一个字段为gawk:的行,在修复代码后将变为

awk '{if($1=="gawk:")print $4}' datafile.txt


并给出输出

(FILENAME=/data/kay/out/501_20091013.txt
(FILENAME=/data/kay/out/501_20091015.txt
(FILENAME=/data/kay/out/501_20091016.txt
(FILENAME=/data/kay/out/501_20091018.txt


然而,这里不需要if,因为GNU AWK使用模式-动作对,所以上面的可能表示为

awk '$1=="gawk:"{print $4}' datafile.txt


现在你需要清理输出,我建议使用以下启发式方法:在_之后和.之前保留一个或多个数字,可以使用gensub function实现,如下所示

awk '$1=="gawk:"{print gensub(/.*_([[:digit:]]+)[.].*/, "\\1", 1, $4)}' datafile.txt


这给了

20091013
20091015
20091016
20091018


注意,[.]表示文字点,而方括号外的.表示任何字符。

  • (在GNU Awk 5.1.0中测试)*
pepwfjgg

pepwfjgg5#

基本结构EREregex就足够了:

echo '
    /data/kay/20091012.csv
    gawk: cmd. line:4: (FILENAME=/data/kay/out/501_20091013.txt FNR=1) fatal: file not file
    /data/kay/20091014.csv
    gawk: cmd. line:4: (FILENAME=/data/kay/out/501_20091015.txt FNR=1) fatal: file not file
    gawk: cmd. line:4: (FILENAME=/data/kay/out/501_20091016.txt FNR=1) fatal: file not file
    /data/kay/20091017.csv
    gawk: cmd. line:4: (FILENAME=/data/kay/out/501_20091018.txt FNR=1) fatal: file not file' |
mawk 'NF *= 2 < NF' OFS= FS='^[ \t]+gawk:[^_]+_|[.].+fatal.+$'
20091013
20091015
20091016
20091018

进一步缩小regex到 * 绝对 * 最小值:

gawk 'NF *= 2 < NF' OFS= FS='^.+gawk:.+_|[.].+$'


还有一个非常非常不合适的方法来提取这个数字:

nawk '$+_ = +$2' FS='_' 
mawk '$0  = +$2' FS='_' 
gawk '$_  = +$2' FS='_'
20091013
20091015
20091016
20091018

的字符串

相关问题