shell 替换列awk的元素时出错

bybem2ql  于 2023-06-30  发布在  Shell
关注(0)|答案(2)|浏览(112)

我用的是泰坦尼克号的csv,我一直在尝试替换第5列和第12列的元素,sex和boarded,所以sex的元素应该是m/f,而不是male/female,在第12列,而不是港口的第一个字母,必须是港口的全名。
CSV最初看起来像这样:

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,Nan,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.28,C85,C
...
886,0,3,"Rice, Mrs. William (Margaret Norton)",female,39,0,5,382652,29.12,Nan,Q
890,1,1,"Behr, Mr. Karl Howell",male,26,0,0,111369,30.00,C148,C
891,0,3,"Dooley, Mr. Patrick",male,32,0,0,370376,7.75,Nan,Q

修改后应该是这样的:

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",m,22,1,0,A/5 21171,7.25,Nan,Southampton
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",f,38,1,0,PC 17599,71.28,C85,Cherbourg
...
886,0,3,"Rice, Mrs. William (Margaret Norton)",f,39,0,5,382652,29.12,Nan,Queenstown
890,1,1,"Behr, Mr. Karl Howell",m,26,0,0,111369,30.00,C148,Cherbourg
891,0,3,"Dooley, Mr. Patrick",m,32,0,0,370376,7.75,Nan,Queenstown

但它不会替换第12列的元素,除了在最后一行,列sex被正确替换:

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",m,22,1,0,A/5 21171,7.25,Nan,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",f,38,1,0,PC 17599,71.28,C85,C
...
886,0,3,"Rice, Mrs. William (Margaret Norton)",f,39,0,5,382652,29.12,Nan,Q
890,1,1,"Behr, Mr. Karl Howell",m,26,0,0,111369,30.00,C148,C
891,0,3,"Dooley, Mr. Patrick",m,32,0,0,370376,7.75,Nan,Queenstown

脚本如下:

BEGIN {
    FPAT = "([^,]*)|(\"[^\"]+\")"
    OFS = ","
}

{
    # Cambiar el valor de la columna sexo a 0 si es "female" o a 1 si es "male"
    if ($5 == "female") 
        $5 = "f"
    else if ($5 == "male") 
        $5 = "m"
    
    # Realizar la sustitución en la columna embarked
    if ($12 == "C") 
        $12 = "Cherbourg"
    else if ($12 == "Q") 
        $12 = "Queenstown"
    else if ($12 == "S") 
        $12 = "Southampton"
     
    print $0
}

为了澄清,第12行的元素中没有空格或字符会导致匹配失败,在python中,替换工作正常。

cwdobuhd

cwdobuhd1#

由于行中间的更改有效,但行末尾的更改无效,因此我怀疑行尾是您的问题。尾随\r可以解释你的症状。
一种更健壮的操作CSV的方法是使用已经内置了完整CSV解析器的工具。
Python支持CSV,例如,sqlite3广泛可用:

#!/bin/sh

sqlite3 >"new.csv" <<'EOD'
.mode csv
.headers on
.import "orig.csv" t
update t set
    sex = case
            when sex="female" then "f"
            when sex="male"   then "m"
            else sex
        end,
    embarked = case
            when embarked="C" then "Cherbourg"
            when embarked="Q" then "Queenstown"
            when embarked="S" then "Southampton"
            else embarked
        end
;
select * from t;
EOD
v2g6jxz6

v2g6jxz62#

我怀疑你是在MacOS上运行这个(或者可能是FreeBSD,这是MacOS版本最初的来源)。从我的FreeBSD盒子中明确选择gnu awk会给我你想要的。

[dev ~/test/awktest]$ gawk -f code.awk data.txt
PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",m,22,1,0,A/5 21171,7.25,Nan,Southampton
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",f,38,1,0,PC 17599,71.28,C85,Cherbourg
...
886,0,3,"Rice, Mrs. William (Margaret Norton)",f,39,0,5,382652,29.12,Nan,Queenstown
890,1,1,"Behr, Mr. Karl Howell",m,26,0,0,111369,30.00,C148,Cherbourg
891,0,3,"Dooley, Mr. Patrick",m,32,0,0,370376,7.75,Nan,Queenstown

(诚然,运行FreeBSD awk并不能正确地得到 * either* 替换...)

相关问题