如果两个CSV文件中都存在值，则比较两列

58wvjzkj 于 2023-09-27 发布在其他

关注(0)|答案(2)|浏览(82)

目标是收集要复制到目标FileSystem的文件名（使用AWK）：
1.如果它们在source.csv中并且在target.csv中不存在
1.文件大小不同
1.源中时间戳大于目标中的时间戳
Source.csv

"2023-08-25","test/test2/filename1","10.00 B"
"2023-07-25","test/test2/filename2","15.00 B"
"2023-07-25","test/test2/filename3","5.00 B"
"2023-07-25","test/test2/filename4","5.00 B"

Target.csv

"2023-08-25","test/test2/filename0","10.00 B"
"2023-07-25","test/test2/filename2","10.00 B"
"2023-07-24","test/test2/filename3","5.00 B"
"2023-07-25","test/test2/filename4","5.00 B"

预期输出：

"2023-08-25","test/test2/filename1","10.00 B"  ### Because does not exists in target.csv
"2023-07-25","test/test2/filename2","10.00 B"  ### Because the size is different
"2023-07-24","test/test2/filename3","5.00 B"   ### Because the timestamp in source.csv is grater than in target.csv (meaning new version in source, not in target)

对于我使用的唯一文件：
awk -v FS="," 'BEGIN { OFS = FS } FNR == NR { unique[$2]; next } !($2 in unique) { print $2 }' target.csv source.csv | tr -d "\"" > files_to_copy.txt
但对于其他两个条件，我无法开发代码。缺少AWK知识。任何帮助？：）

csv

来源：https://stackoverflow.com/questions/77173730/compare-two-columns-if-value-exists-in-both-csv-files

2条答案

按热度按时间

nc1teljy1#

假设条件：

所有字段都用一对双引号括起来
数据字段中没有嵌入/转义双引号
文件名在一个文件中是唯一的（即一个文件名在一个文件中不会出现多次）
所有尺寸的测量单位为B
大小字段中的第一个非空字符是数字

一个awk的想法：

awk -F'"' '                          # input field separator is double quote => data values are in even-numbered fields
FNR==NR { unique[$4]                 # use filename index for arrays
          size[$4]=$6+0              # "+0" will strip spaces and trailing "B", leaving us with just a number
          date[$4]=$2
          next
        }
!( $4 in unique       ) ||           #      if source file not in unique[] array then print current line
 ( size[$4] != ($6+0) ) ||           # (or) if sizes are different then print current line
 ( $2 > date[$4]      )              # (or) if source date is greater than target date then print current line
' target.csv source.csv

这产生：

"2023-08-25","test/test2/filename1","10.00 B"
"2023-07-25","test/test2/filename2","15.00 B"
"2023-07-25","test/test2/filename3","5.00 B"

赞(0）回复(0）举报 2023-09-27

rta7y2nd2#

使用任何POSIX awk，无论CSV中的文件名中出现哪些字符（除换行符外），并假设每个文件名都是唯一的：

$ cat tst.awk
BEGIN { FS="," }
{
    name = $0
    gsub(/^"[^"]*|[^"]*"$/,"",name)
}
NR == FNR {
    d[name] = $1
    s[name] = $NF
    next
}
!(name in d) || ($1 > d[name]) || ($NF != s[name])

$ awk -f tst.awk Target.csv Source.csv
"2023-08-25","test/test2/filename1","10.00 B"
"2023-07-25","test/test2/filename2","15.00 B"
"2023-07-25","test/test2/filename3","5.00 B"

上面的代码假设CSV的第一个或最后一个字段中没有逗号或双引号。

赞(0）回复(0）举报 2023-09-27

我来回答

如果两个CSV文件中都存在值，则比较两列

2条答案

相关问题

热门标签

最新问答