csv 仅提取每行中第一个出现的字符串

wmvff8tz 于 2022-12-06 发布在其他

关注(0)|答案(3)|浏览(132)

我有一个CSV文件，大约5k行，示例如下：

apple,tea,salt,fish
apple,oranges,ketchup
...
salad,oreo,lemon
salad,soda,water

我只需要提取匹配apple或salad的第一行，并跳过出现这些单词的其他行。
我可以用regex“apple”来做类似的事情。|salad”，但它会提取所有找到这些单词的行。
所需结果为：

apple,tea,salt,fish
salad,oreo,lemon

我能够在文本编辑器和OpenOffice Calc应用程序中使用REGEX。

csv

来源：https://stackoverflow.com/questions/74034541/extract-only-first-occurrence-of-strings-from-each-line

3条答案

按热度按时间

mwkjh3gx1#

您可以使用强大的Miller，并运行

mlr --nidx --fs "," filter '$1=~"(apple|salad)"' then head -n 1 -g 1  input.csv

为了能具有

apple,tea,salt,fish
salad,oreo,lemon

--nidx，用于设置格式，通用索引格式
--fs ","，用于设置分隔符
filter '$1=~"(apple|salad)"'，将正则表达式筛选器应用于第一个字段
then head -n 1 -g 1，根据第一个字段的值获取第一条记录

赞(0）回复(0）举报 2022-12-06

ctzwtxfj2#

假定行已排序

按Ctrl+H组合键
查找内容：^(\w+)(.+\R?)(?:\1(?2))+
替换为：$1$2
***滴答***回绕 *
***SELECT***正则表达式 *
*取消勾选. matches newline
全部替换
说明：

^           # beginning of line
    (\w+)       # group 1, 1 or more word character, you can use ([^,\r\n]+) if the the first word contains other characters than "word" characters
    (           # start group 2
        .+          # 1 or more any character but newline
        \R?         # any kind of linebreak, optional
    )           # end group 2
    (?:         # non capture group
        \1          # backreference to group 1 (i.e. the same word)
        (?2)        # reuse the pattern of group 2, i.e. (.+\R?)
    )+          # end group, may appear 1 or more times

屏幕截图（之前）：

屏幕截图（之后）：

赞(0）回复(0）举报 2022-12-06

wqlqzqxt3#

在记事本++中，重复执行正则表达式，将^(\w+,)(.*)\R\1.*$替换为\1\2。选择“环绕”。
说明：

^          Match beginning of line
(\w+,)     Match the leading word plus comma, save to capture group 1
(.*)       Match the rest of the line, save to capture group 2
\R         Match a line break
\1         Match the same leading word plus comma
.*         Match the rest of the line
$          Match the end of the line

替换字符串只保留第一行，第二行被丢弃。
演示：
起始值：

apple,tea1,salt1,fish1
apple,tea2,salt2,fish2
apple,oranges1,ketchup1
apple,oranges2,ketchup2
apple,oranges3,ketchup3
apple,oranges4,ketchup4
salad,oreo1,lemon1
salad,oreo2,lemon2
salad,soda1,water1
salad,soda2,water2

用上面的表达式执行“全部替换”操作会得到：

apple,tea1,salt1,fish1
apple,oranges1,ketchup1
apple,oranges3,ketchup3
salad,oreo1,lemon1
salad,soda1,water1

再单击两次“全部替换”，将产生：

apple,tea1,salt1,fish1
salad,oreo1,lemon1

每按一次“全部替换”将删除大约一半不需要的行。

赞(0）回复(0）举报 2022-12-06

我来回答

csv 仅提取每行中第一个出现的字符串

3条答案

假定行已排序

相关问题

热门标签

最新问答