我用的是泰坦尼克号的csv,我一直在尝试替换第5列和第12列的元素,sex和boarded,所以sex的元素应该是m/f,而不是male/female,在第12列,而不是港口的第一个字母,必须是港口的全名。
CSV最初看起来像这样:
PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,Nan,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.28,C85,C
...
886,0,3,"Rice, Mrs. William (Margaret Norton)",female,39,0,5,382652,29.12,Nan,Q
890,1,1,"Behr, Mr. Karl Howell",male,26,0,0,111369,30.00,C148,C
891,0,3,"Dooley, Mr. Patrick",male,32,0,0,370376,7.75,Nan,Q
修改后应该是这样的:
PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",m,22,1,0,A/5 21171,7.25,Nan,Southampton
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",f,38,1,0,PC 17599,71.28,C85,Cherbourg
...
886,0,3,"Rice, Mrs. William (Margaret Norton)",f,39,0,5,382652,29.12,Nan,Queenstown
890,1,1,"Behr, Mr. Karl Howell",m,26,0,0,111369,30.00,C148,Cherbourg
891,0,3,"Dooley, Mr. Patrick",m,32,0,0,370376,7.75,Nan,Queenstown
但它不会替换第12列的元素,除了在最后一行,列sex被正确替换:
PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",m,22,1,0,A/5 21171,7.25,Nan,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",f,38,1,0,PC 17599,71.28,C85,C
...
886,0,3,"Rice, Mrs. William (Margaret Norton)",f,39,0,5,382652,29.12,Nan,Q
890,1,1,"Behr, Mr. Karl Howell",m,26,0,0,111369,30.00,C148,C
891,0,3,"Dooley, Mr. Patrick",m,32,0,0,370376,7.75,Nan,Queenstown
脚本如下:
BEGIN {
FPAT = "([^,]*)|(\"[^\"]+\")"
OFS = ","
}
{
# Cambiar el valor de la columna sexo a 0 si es "female" o a 1 si es "male"
if ($5 == "female")
$5 = "f"
else if ($5 == "male")
$5 = "m"
# Realizar la sustitución en la columna embarked
if ($12 == "C")
$12 = "Cherbourg"
else if ($12 == "Q")
$12 = "Queenstown"
else if ($12 == "S")
$12 = "Southampton"
print $0
}
为了澄清,第12行的元素中没有空格或字符会导致匹配失败,在python中,替换工作正常。
2条答案
按热度按时间cwdobuhd1#
由于行中间的更改有效,但行末尾的更改无效,因此我怀疑行尾是您的问题。尾随
\r
可以解释你的症状。一种更健壮的操作CSV的方法是使用已经内置了完整CSV解析器的工具。
Python支持CSV,例如,
sqlite3
广泛可用:v2g6jxz62#
我怀疑你是在MacOS上运行这个(或者可能是FreeBSD,这是MacOS版本最初的来源)。从我的FreeBSD盒子中明确选择gnu awk会给我你想要的。
(诚然,运行FreeBSD awk并不能正确地得到 * either* 替换...)