重定向的url列表比较

g6baxovj  于 2021-06-21  发布在  Mysql
关注(0)|答案(1)|浏览(363)

我得到了2个csv文件,每个文件包含3000多个url。
我的任务是创建一个从“旧站点”到“新站点”的.htaccess“重定向”块,而不是遍历和手动比较它们,我想我可以简单地尝试一个bash/python脚本,或者将它们导入mysql来进行比较。
因此,在bash中,我尝试了以下代码:


# !/bin/bash

awk 'BEGIN{FS=OFS="/"} {gsub(/\/$/, ""); $NF=tolower($NF)} NR==FNR{a[$NF]=$0; next} $NF in a {print a[$NF] " " $0 > "combined.csv"}' oldsite.csv newsite.csv

但是,它返回一个空的“combined.csv”,所以我想可能是“python”。。。但是,唉,我对python知之甚少,所以我想mysql。。。如果我只是将每个csv导入一个新表,我就可以运行一个比较sql语句并将结果转储到一个2列的新表中。。。唉,我真的不知道从哪里开始比较,在一个 LIKE 比较声明,但我想知道的是什么是“最好的”(意思是最准确的比较)方法。。。如果是Python,怎么办?
csv示例
新建URL

"new-url"
"/product/dangle-hoop-earrings-for-girls-with-cz-and-heart-dangle-in-14k-gold/"
"/product/dangle-hoop-earrings-for-girls-with-cz-and-butterfly-dangle-in-14k-gold/"
"/product/petite-lever-back-earrings-for-little-girls-in-14k-yellow-gold-with-blue-topaz-high-end-childrens-earrings/"

旧URL

"old-url"
"/product/0903-HUGGIEGK/Dangle-Hoop-Earrings-for-Girls-with-CZ-and-Heart-Dangle-in-14K-Gold/"
"/product/0954-HUGGIEGK/Dangle-Hoop-Earrings-for-Girls-with-CZ-and-Butterfly-Dangle-in-14K-Gold/"
"/product/10049Y4JBT/Petite-Lever-Back-Earrings-for-Little-Girls-in-14K-Yellow-Gold-with-Blue-Topaz---High-End-Childrens-Earrings/"

预期组合

"old-url", "new-url"
"/product/0903-HUGGIEGK/Dangle-Hoop-Earrings-for-Girls-with-CZ-and-Heart-Dangle-in-14K-Gold/", "/product/dangle-hoop-earrings-for-girls-with-cz-and-heart-dangle-in-14k-gold/"
"/product/0954-HUGGIEGK/Dangle-Hoop-Earrings-for-Girls-with-CZ-and-Butterfly-Dangle-in-14K-Gold/", "/product/dangle-hoop-earrings-for-girls-with-cz-and-butterfly-dangle-in-14k-gold/"
"/product/10049Y4JBT/Petite-Lever-Back-Earrings-for-Little-Girls-in-14K-Yellow-Gold-with-Blue-Topaz---High-End-Childrens-Earrings/", "/product/petite-lever-back-earrings-for-little-girls-in-14k-yellow-gold-with-blue-topaz-high-end-childrens-earrings/"
2w3rbyxf

2w3rbyxf1#

正如我们在评论线程中发现的,您需要转换数据以便在 awk/unix 通过移除 \r 部分ms dos行尾带有

dos2unix file

转化为 file 行尾自 \r\n\n . 请注意,您可以致电 dos2unix 具有多个文件名,每个文件都将被处理,即。

dos2unix old.csv new.csv many_more ...

这是您修改过的代码,它将为“new”文件中不匹配的记录创建一个单独的文件。我发现唯一需要纠正的是更改最终输出以包含 , 查尔,所以 print a[$NF] "," $0 .


# !/bin/bash

awk 'BEGIN{FS=OFS="/"}
  { gsub(/\/$/, "")
    # print "#dbg: FILENAME="FILENAME "\tNR="NR "\tFNR="FNR
    $NF=tolower($NF)
  }
  NR==FNR{
    a[$NF]=$0; next
  }
  {
    if ($NF in a) {
      print  a[$NF] "," $0  > "combined.csv"
    }
    else {
      print  a[$NF] "," $0  > "unmatched.csv"
    }
  }
  ' oldsite.csv newsite.csv

输出

cat combined.csv

"/product/10049Y4JBT/Petite-Lever-Back-Earrings-for-Little-Girls-in-14K-Yellow-Gold-with-Blue-Topaz---High-End-Childrens-Earrings/","/product/dangle-hoop-earrings-for-girls-with-cz-and-heart-dangle-in-14k-gold/"
"/product/10049Y4JBT/Petite-Lever-Back-Earrings-for-Little-Girls-in-14K-Yellow-Gold-with-Blue-Topaz---High-End-Childrens-Earrings/","/product/dangle-hoop-earrings-for-girls-with-cz-and-butterfly-dangle-in-14k-gold/"
"/product/10049Y4JBT/Petite-Lever-Back-Earrings-for-Little-Girls-in-14K-Yellow-Gold-with-Blue-Topaz---High-End-Childrens-Earrings/","/product/petite-lever-back-earrings-for-little-girls-in-14k-yellow-gold-with-blue-topaz-high-end-childrens-earrings/"

cat unmatched.csv
,"new-url"

ihth公司

相关问题