shell 如何使用awk比较3个文件中的值

blpfk2vs  于 2023-08-07  发布在  Shell
关注(0)|答案(4)|浏览(90)

我试图比较三个文件中的一些共同和独特的价值观,我想俱乐部所有三个文件在一个文件。我想找出这三个文件的共同值,第二,至少存在于任何两个文件中的所有值第三,一个文件相对于其他两个文件唯一的值。我写了一个代码,但它只给我常见的。
我只想比较文件之间的“名称”列(不是所有列),但是想打印所有三个文件的所有三列

$ cat file_1    
id,name,value
1,a,20
2,b,34
3,c,5

$ cat file_2
id,name,value
1,a,27
2,b,55
7,d,15
9,z,100

$ cat file_3
id,name,value
1,a,77
2,b,95
11,d,83
6,y,109

字符串
预期的输出:

id,name,value,id2,name2,value2,id3,name3,value3
1,a,20,1,a,27,1,a,77
2,b,34,2,b,55,2,b,95
3,c,5,NA,NA,NA,NA,NA,NA
NA,NA,NA,7,d,15,11,d,83
NA,NA,NA,9,z,100,NA,NA,NA
NA,NA,NA,NA,NA,NA,6,y,109


我试过这个代码,但它只给我两个文件之间的比较。没有给我唯一的值

#!/usr/bin/awk -f

BEGIN {
    FS = OFS = ",";  
    print "id,name,value,id2,name2,value2,id3,name3,value3";  
}

function print_na_row(name) {
    print name, "NA", "NA", "NA", "NA", "NA", "NA";
}

# Read file1.csv and store the data in an array
NR == FNR && FNR > 1 {
    id = $1;
    name = $2;
    value = $3;
    
    file1_data[name] = id "," value;
    next;
}

# Read file2.csv and file3.csv and merge with data from file1.csv
FNR > 1 {
    id = $1;
    name = $2;
    value = $3;
    
    current_row = id "," value;
    if (name in file1_data) {
        print name, file1_data[name], current_row;
        delete file1_data[name];  # Remove the matched entry to handle unique values in file1.csv
    } else {
        print_na_row(name);
    }
}
END {
    for (id in file1_data) {
        print_na_row(name);
    }
}


请帮我拿这个。

h6my8fg2

h6my8fg21#

一个awk的想法:

awk '
BEGIN  { FS = OFS = "," }
FNR==1 { fcnt++; next }
       { names[$2]; lines[$2 FS fcnt] = $0 }
END    { print "id,name,value,id2,name2,value2,id3,name3,value3"
         for (name in names) {
             out = sep = ""
             for (i=1; i<=fcnt; i++) {
                 out = out sep (lines[name FS i] ? lines[name FS i] : "NA,NA,NA")
                 sep = OFS
             }
             print out
         }
       }
' file_1 file_2 file_3

字符串

注意事项:

  • 根据OP的评论,我们不必担心订购,否则...
  • OP需要指定排序标准

这产生:

id,name,value,id2,name2,value2,id3,name3,value3
NA,NA,NA,NA,NA,NA,6,y,109
NA,NA,NA,9,z,100,NA,NA,NA
1,a,20,1,a,27,1,a,77
2,b,34,2,b,55,2,b,95
3,c,5,NA,NA,NA,NA,NA,NA
NA,NA,NA,7,d,15,11,d,83

aydmsdu9

aydmsdu92#

我不明白为什么你想要你所显示的预期输出,我很难相信你真的想要这样的输出,所以这里有一些对我有意义的输出(通常的输出格式是从join操作中期望的,首先是公共字段,然后是每个输入文件的不同字段),你可以按摩以适应。
使用任何awk:

$ cat tst.awk
BEGIN {
    FS=OFS=","
    dfltRow = "NA" FS "NA"
}
FNR == 1 {
    ++numFiles
    $1 = $1 "_" numFiles
    $3 = $3 "_" numFiles
}
{
    if ( !seen[$2]++ ) {
        keys[++numKeys] = $2
    }
    rows[$2 FS numFiles] = $1 FS $3
}
END {
    for ( keyNr=1; keyNr<=numKeys; keyNr++ ) {
        key = keys[keyNr]
        printf "%s", key
        for ( fileNr=1; fileNr<=numFiles; fileNr++ ) {
            idx = key FS fileNr
            row = ( idx in rows ? rows[idx] : dfltRow )
            numFlds = split(row,flds)
            for ( fldNr=1; fldNr<=numFlds; fldNr++ ) {
                printf "%s%s", OFS, flds[fldNr]
            }
        }
        print ""
    }
}

字符串

$ awk -f tst.awk file_{1..3}
name,id_1,value_1,id_2,value_2,id_3,value_3
a,1,20,1,27,1,77
b,2,34,2,55,2,95
c,3,5,NA,NA,NA,NA
d,NA,NA,7,15,11,83
z,NA,NA,9,100,NA,NA
y,NA,NA,NA,NA,6,109

u0sqgete

u0sqgete3#

我想找出所有三个文件之间的共同值,第二,至少存在于任何两个文件中的所有值,第三,一个文件相对于其他两个文件唯一的值
我建议采用以下方法:在2D数组中存储行数,其中键是line,filename考虑以下简单示例file1.txt内容是

A
B
C

字符串
file2.txt内容为

C
B


file3.txt内容为

C
C


然后,

awk '{arr[$0][FILENAME]+=1}END{for(i in arr){print i,"is present in",length(arr[i]),"file(s)"}}' file1.txt file2.txt file3.txt


给出输出

A is present in 1 file(s)
B is present in 2 file(s)
C is present in 3 file(s)


说明:对于每一行,将数组arr中的值在关键字当前行内容($0)、文件名下增加1。当所有文件都在处理print行时,以及它们存在于多少个文件中。请注意,这段代码并没有将file3.txt中的重复行视为2个文件。这段代码应该可以与3个以上的文件一起正常工作。

  • (在GNU Awk 5.1.0中测试)*
2izufjch

2izufjch4#

awk -F, '
    FILENAME != file { fidx+=1; file=FILENAME }
    FNR == 1{
        for(i=1; i<=NF;i++) $i=$i fidx","
        header=header $0;
        next
    }
    !($1","1 in a){
        ids[$1]
        a[$1","1]=a[$1","2]=a[$1","3]="NA,NA,NA"
    }
    { a[$1","fidx]=$0 }
    END{
        gsub(/ |,$/,"",header)
        print header
        for (id in ids){
            out=""
            for(i=1; i<=3; i++){
                out=out sprintf("%s,",a[id","i])
            }
            sub(/,$/,"",out)
            print out
        }
    }
'  input_file[1-3]

id1,name1,value1,id2,name2,value2,id3,name3,value3
1,a,20,1,a,27,1,a,77
2,b,34,2,b,55,2,b,95
3,c,5,NA,NA,NA,NA,NA,NA
NA,NA,NA,NA,NA,NA,6,y,109
NA,NA,NA,7,d,15,NA,NA,NA
NA,NA,NA,9,z,100,NA,NA,NA
NA,NA,NA,NA,NA,NA,11,d,83

字符串

相关问题