ubuntu 如何在Bash中分割多字符分隔符上的字符串？

k4aesqcs 于 2023-08-03 发布在其他

关注(0)|答案(4)|浏览(160)

为什么下面的Bash代码不起作用？

for i in $( echo "emmbbmmaaddsb" | split -t "mm"  )
do
    echo "$i"
done

字符串
预期的输出：

e
bb
aaddsb

型

ubuntu

来源：https://stackoverflow.com/questions/40686922/how-to-split-a-string-on-a-multi-character-delimiter-in-bash

4条答案

按热度按时间

dm7nw8vv1#

由于您需要换行符，您可以简单地将字符串中mm的所有示例替换为换行符。在纯天然bash中：

in='emmbbmmaaddsb'
sep='mm'
printf '%s\n' "${in//$sep/$'\n'}"

字符串
如果你想在更长的输入流上做这样的替换，你最好使用awk，因为bash的内置字符串操作不能很好地扩展到超过几千字节的内容。BashFAQ #21中给出的gsub_literal shell函数（后端为awk）适用：

# Taken from http://mywiki.wooledge.org/BashFAQ/021

# usage: gsub_literal STR REP
# replaces all instances of STR with REP. reads from stdin and writes to stdout.
gsub_literal() {
  # STR cannot be empty
  [[ $1 ]] || return

  # string manip needed to escape '\'s, so awk doesn't expand '\n' and such
  awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" '
    # get the length of the search string
    BEGIN {
      len = length(str);
    }

    {
      # empty the output string
      out = "";

      # continue looping while the search string is in the line
      while (i = index($0, str)) {
        # append everything up to the search string, and the replacement string
        out = out substr($0, 1, i-1) rep;

        # remove everything up to and including the first instance of the
        # search string from the line
        $0 = substr($0, i + len);
      }

      # append whatever is left
      out = out $0;

      print out;
    }
  '
}

型
.在此情况下，用作：

gsub_literal "mm" $'\n' <your-input-file.txt >your-output-file.txt

型

赞(0）回复(0）举报 2023-08-03

enyaitl32#

推荐的字符替换工具是sed的命令s/regexp/replacement/，用于一个regexp事件或全局s/regexp/replacement/g，您甚至不需要循环或变量。
输出echo，并尝试用换行符\n替换字符mm：
echo "emmbbmmaaddsb" | sed 's/mm/\n/g'个
输出为：

e
bb
aaddsb

字符串

赞(0）回复(0）举报 2023-08-03

lymgl2op3#

下面给出了一个更一般的示例，不使用单个字符分隔符替换多字符分隔符：
使用参数展开：（来自@gniourf_gniourf的评论）

#!/bin/bash

str="LearnABCtoABCSplitABCaABCString"
delimiter=ABC
s=$str$delimiter
array=();
while [[ $s ]]; do
    array+=( "${s%%"$delimiter"*}" );
    s=${s#*"$delimiter"};
done;
declare -p array

字符串
一种更粗糙的方式

#!/bin/bash

# main string
str="LearnABCtoABCSplitABCaABCString"

# delimiter string
delimiter="ABC"

#length of main string
strLen=${#str}
#length of delimiter string
dLen=${#delimiter}

#iterator for length of string
i=0
#length tracker for ongoing substring
wordLen=0
#starting position for ongoing substring
strP=0

array=()
while [ $i -lt $strLen ]; do
    if [ $delimiter == ${str:$i:$dLen} ]; then
        array+=(${str:strP:$wordLen})
        strP=$(( i + dLen ))
        wordLen=0
        i=$(( i + dLen ))
    fi
    i=$(( i + 1 ))
    wordLen=$(( wordLen + 1 ))
done
array+=(${str:strP:$wordLen})

declare -p array

型
参考-Bash Tutorial-Bash Split String

赞(0）回复(0）举报 2023-08-03

ca1c2owp4#

使用awk，您可以使用gsub替换所有正则表达式匹配项。
如您的问题所示，要用新行替换两个或更多'm'字符串的所有子字符串，请运行：

echo "emmbbmmaaddsb" | awk '{ gsub(/mm+/, "\n" ); print; }'

字符串
电子
bb
aaddb
gsub（）中的'g'代表“global”，意思是到处替换。
您也可以要求只打印N个匹配项，例如：

echo "emmbbmmaaddsb" | awk '{ gsub(/mm+/, " " ); print $2; }'

型
bb

赞(0）回复(0）举报 2023-08-03

我来回答

ubuntu 如何在Bash中分割多字符分隔符上的字符串？

4条答案

相关问题

热门标签

最新问答