csv 检索双引号外的分号之间的数据

wvmv3b1j 于 2023-04-18 发布在其他

关注(0)|答案(6)|浏览(113)

我在csv数据中有以下数据。一些列有数据，而在那些没有数据的列中，放置了分号。这不能更改。以下是三行的示例：

first;;;;;
;;Second;;;;
;;;"Third;Fourth";;;
;;"Fifth;Sixth";;;

我想得到分号个数不等于6的行。而且，我想只计算双引号之外的分号。所以第三行不应该计算为等于6。第四行也应该包括在内，因为双引号之外的分号个数不等于6。
我使用以下代码

TARGETFILE=data.csv
variable=$(awk -F ';' 'NF != 7' <$TARGETFILE)

我怎样才能得到分号个数不等于6的行？

csv

来源：https://stackoverflow.com/questions/75997353/retrieve-data-between-semicolons-that-are-outside-of-double-quotes

6条答案

按热度按时间

watbbzwu1#

如果你有GNU awk，这个一行程序应该可以做到：

awk 'BEGIN { FPAT = "\"[^\"]*\"|[^;]*" } NF != 7' file

或者，您可以使用此sed解决方案：

sed 'h; s/"[^"]*"//g; s/[^;]//g; /^;;;;;;$/d; x' file

赞(0）回复(0）举报 2023-04-18

xxls0lw82#

使用任何awk：

$ awk '{x=$0; gsub(/"[^"]*"/,"",x)} gsub(/;/,"",x) != 6' file
first;;;;;
;;"Fifth;Sixth";;;

或者，如果您愿意：

$ awk -F';' '{x=$0; gsub(/"[^"]*"/,"")} NF != 7{print x}' file
first;;;;;
;;"Fifth;Sixth";;;

赞(0）回复(0）举报 2023-04-18

luaexgnf3#

如果您只需要带有六个分号的行，grep可以处理这个问题。

$: cat tst
;;;;;;
bad;;;;;
good;;;;;;
;;;;;;;bad
"is;ok";;;;;;
;;good;;;;
;;;;;;"is;ok"
;this;"not;ok";;;
;;;;;;fine
"not;ok";;;
;;"nope;again";;;
;;;;;;;;;"not;ok"

$: grep -En '^(("[^"]*"|[^;"]*)*;("[^"]*"|[^;"]*)*){6}$' tst
1:;;;;;;
3:good;;;;;;
5:"is;ok";;;;;;
6:;;good;;;;
7:;;;;;;"is;ok"
9:;;;;;;fine

$: grep -Env '^(("[^"]*"|[^;"]*)*;("[^"]*"|[^;"]*)*){6}$' tst
2:bad;;;;;
4:;;;;;;;bad
8:;this;"not;ok";;;
10:"not;ok";;;
11:;;"nope;again";;;
12:;;;;;;;;;"not;ok"

甚至给出了行号。

赞(0）回复(0）举报 2023-04-18

jmo0nnb34#

借用保罗的例子：

echo '
 ;;;;;;
 bad;;;;;
 good;;;;;;
 ;;;;;;;bad
 "is;ok";;;;;;
 ;;good;;;;
 ;;;;;;"is;ok"
 ;this;"not;ok";;;
 ;;;;;;fine
 "not;ok";;;
 ;;"nope;again";;;
 ;;;;;;;;;"not;ok"' | gcat -n |

awk -F';?"[^"]*";?|;' NF==7

但对于原始测试样品，必须对其进行轻微修改
（NF-7实现了与'NF != 7'相同的效果，而无需shell引用它）

echo '
 first;;;;;
 ;;Second;;;;
 ;;;"Third;Fourth";;;
 ;;"Fifth;Sixth";;;' |

awk -F';?"[^"]*"|;' NF-7

first;;;;;
 ;;"Fifth;Sixth";;;

赞(0）回复(0）举报 2023-04-18

smdnsysy5#

CSV格式比它第一次出现时要复杂得多。例如，我认为在字符串中使用双引号的方法是使用两个双引号：“"。我怀疑上述解决方案是否能处理这些问题，但现在没有精力去分析它们。我认为，要正确处理这个问题已经足够坚韧了，你真的需要一个专门的程序来处理所有的边缘情况。

赞(0）回复(0）举报 2023-04-18

qgelzfjb6#

重用Pauls文件：

cat file
;;;;;;
bad;;;;;
good;;;;;;
;;;;;;;bad
"is;ok";;;;;;
;;good;;;;
;;;;;;"is;ok"
;this;"not;ok";;;
;;;;;;fine
"not;ok";;;
;;"nope;again";;;
;;;;;;;;;"not;ok"

您可以使用Ruby来计算字段：

ruby -r csv -e '$<.each{|line| 
    len=CSV.parse(line, col_sep:";").flatten.length
    puts "#{sprintf("%2s",$.)}: \"#{line.chomp}\" => #{len} fields" 
}' file

图纸：

1: ";;;;;;" => 7 fields
 2: "bad;;;;;" => 6 fields
 3: "good;;;;;;" => 7 fields
 4: ";;;;;;;bad" => 8 fields
 5: ""is;ok";;;;;;" => 7 fields
 6: ";;good;;;;" => 7 fields
 7: ";;;;;;"is;ok"" => 7 fields
 8: ";this;"not;ok";;;" => 6 fields
 9: ";;;;;;fine" => 7 fields
10: ""not;ok";;;" => 4 fields
11: ";;"nope;again";;;" => 6 fields
12: ";;;;;;;;;"not;ok"" => 10 fields

如果要筛选具有7个字段的行：

ruby -r csv -e '$<.each{|line| 
    len=CSV.parse(line, col_sep:";").flatten.length
    if len==7 then puts line end
}' file

图纸：

;;;;;;
good;;;;;;
"is;ok";;;;;;
;;good;;;;
;;;;;;"is;ok"
;;;;;;fine

注意：与计算字段分隔符相比，计算数据字段的数量少一个：

1;2;3;4;"five; with sep";6 # six fields, five field separators...

赞(0）回复(0）举报 2023-04-18

我来回答

csv 检索双引号外的分号之间的数据

6条答案

相关问题

热门标签

最新问答