注意：我写了几个草稿，简化了正则表达式，所以如果有任何不一致的地方，这可能就是原因。

你关心标点符号吗？例如，在一些调用中，你会看到像 （etc） 这样的“word”，它和括号完全一样。或者这个词应该是“parameters.”而不是“parentheses”。如果你用正确的句子解析一个文件，这可能是一个问题，特别是如果你想按单词排序，甚至想得到每个单词的字数。
有很多方法可以解决这个问题，但也有一些需要注意的地方，当然还有改进的空间，破折号（在数字中）和小数点/点（在数字中）。也许有一套精确的规则可以帮助解决这个问题，但下面的例子可以给予你一些工作。我做了一些人为的输入例子来演示这些缺陷（或任何你想称之为它们的东西）。

$ echo "This is an example sentence with punctuation marks and digits i.e. , . ; \! 7 8 9" | grep -o -E '\<[A-Za-z0-9.]*\>'
This
is
an
example
sentence
with
punctuation
marks
and
digits
i.e
7
8
9

正如你所看到的，i.e. ' 变成了 i.e，其他的标点符号都没有显示出来。好的，但是这样就省略了版本号这样的东西，以主要.次要.修订版本的形式出现，例如 0.0.1-1;这个也能显示吗？2是的：

$ echo "The current version is 0.0.1-1. The previous version was current from 2017-2018."|grep -o -E '\<[-A-Za-z0-9.]*\>'
The
current
version
is
0.0.1-1
The
previous
version
was
current
from
2017-2018

请注意，这些句子没有以句号结尾。如果在年份和破折号之间添加一个空格，会发生什么情况？您将不会有破折号，但每个年份都将在自己的行上：

$ echo "2017 - 2018" | grep -o -E '\<[-A-Za-z0-9.]*\>'
2017
2018

然后问题就变成了，你是否希望-本身被计算;根据单词分隔的本质，如果有空格，你就不会把年份作为一个单独的字符串。因为它本身不是一个单词，我认为不是。
我相信这些可以进一步简化。此外，如果你不想任何标点符号或数字，你可以把它改为：

$ echo "The current version is 0.0.1-1. The previous version was current from 2017-2018."|grep -o -E '\<[A-Za-z]*\>'
The
current
version
is
The
previous
version
was
current
from

如果你想得到数字：

$ echo "The current version is 0.0.1-1. The previous version was current from 2017-2018."|grep -o -E '\<[A-Za-z0-9]*\>'
The
current
version
is
0
0
1
1
The
previous
version
was
current
from
2017
2018

至于“单词”既有字母又有数字，这是另一件事，可能会或可能不会考虑，但证明了上述：

$ echo "The current version is 0.0.1-1. test1."|grep -o -E '\<[A-Za-z0-9]*\>'
The
current
version
is
0
0
1
1
test1

输出它们。但下面的代码不这样做（因为它根本不考虑数字）：

$ echo "The current version is 0.0.1-1. test1."|grep -o -E '\<[A-Za-z]*\>'
The
current
version
is

忽略标点符号是很容易的，但在某些情况下，可能需要或渴望使用标点符号。在 e.g. 的情况下，我想你可以使用say sed将 e.g 这样的行改为 * e.g.*，但我想这是个人偏好。
我可以总结它是如何工作的，但只是;我累得连想都懒得想

它是如何工作的？

我将只解释grep -o -E '\<[-A-Za-z0-9.]*\>'的调用，但其他调用中的大部分内容都是相同的（扩展grep中的竖线/管道符号允许使用多个模式）：

-o * 选项只用于打印匹配项而不是整行。* -E * 用于扩展grep（也可以使用egrep）。至于regexp本身：

<\和\>是字边界（分别是开始和结束-如果需要，可以只指定一个）;我相信-w选项与指定两者是一样的，但调用可能有点不同（我实际上不知道）。
'\<[-A-Za-z0-9.]*\>'表示破折号、大小写字母和一个点零次或多次。至于为什么它会变成 e.g.. e.g. *，我现在只能说这是模式，但我没有能力考虑更多。

词频计数的附加脚本

#!/bin/bash

if [ $# -eq 0 ]; then
    echo "Usage: $(basename ${0}) <FILE> [FILE...]"
    exit 1
fi

for file do
    if [ -e "${file}" ]
    then
        echo "** ${file}: "
        grep -o -E '\<[-A-Za-z0-9.]*\>' "${file}"|sort | uniq -c | sort -rn
    else
    echo >&2 "${1}: file not found"
    continue
    fi
done

示例：

$ cat example 
The current version is 0.0.1-1 but the previous version was non-existent.

This sentence contains an abbreviation i.e. e.g. (so actually two abbreviations).

This sentence has no numbers and no punctuation  
$ ./wordfreq example 
** example: 
   2 version
   2 sentence
   2 no
   2 This
   1 was
   1 two
   1 the
   1 so
   1 punctuation
   1 previous
   1 numbers
   1 non-existent
   1 is
   1 i.e
   1 has
   1 e.g
   1 current
   1 contains
   1 but
   1 and
   1 an
   1 actually
   1 abbreviations
   1 abbreviation
   1 The
   1 0.0.1-1

注意：我没有将大写字母音译为小写字母，所以单词'The'和'the'显示为不同的单词。如果您希望它们都是小写字母，可以在排序之前将脚本中的grep调用改为通过管道传输到tr：

grep -o -E '\<[-A-Za-z0-9.]*\>' "${file}"|tr '[A-Z]' '[a-z]'|sort | uniq -c | sort -rn

哦，既然您询问是否要将其写入文件，您可以直接添加到命令行（这是用于原始调用）：

> output_file

对于脚本，您可以像这样使用它：

$ ./wordfreq file1 file2 file3 > output_file

8条答案

按热度按时间

9lowa7mx1#

几种方法去做它，选择你最喜欢的！

echo "This is for example" | tr ' ' '\n' > example.txt

或者只是这样做以避免不必要地使用echo：

tr ' ' '\n' <<< "This is for example" > example.txt

<<<表示法与herestring一起使用
或者，使用sed代替tr：

sed "s/ /\n/g" <<< "This is for example" > example.txt

更多选择，检查其他人的答案=）

赞(0）回复(0）举报 2022-11-25

ubbxdtey2#

$ echo "This is for example" | xargs -n1
This
is
for
example

jqjz2hbq3#

尝试使用：

string="This is for example"

printf '%s\n' $string > filename.txt

或利用bash * 字拆分 *

string="This is for example"

for word in $string; do
    echo "$word"
done > filename.txt

gg0vcinb4#

example="This is for example"
printf "%s\n" $example

xoshrz7s5#

aor9mmx16#

使用fmt命令

>> echo "This is for example" | fmt -w1 > textfile.txt ; cat textfile.txt
This
is
for
example

有关fmt及其选项的完整说明，请查看related man page。

ia2d9nvy7#

请尝试用途：

str="This is for example"
echo -e ${str// /\\n} > file.out

输出 *

> cat file.out 
This
is
for
example

ux6nzvsh8#

没有人建议使用bash的内置read命令：
第一个
数据始终被完全引用，因此不会受到文件名扩展的影响。
$IFS的 current 值将控制拆分。默认值为space-tab-newline：IFS=$' \t\n'

shell 每个单词占一行

8条答案

注意：我写了几个草稿，简化了正则表达式，所以如果有任何不一致的地方，这可能就是原因。

它是如何工作的？

词频计数的附加脚本

相关问题

热门标签

最新问答