regex 如何使用JavaScript将字符串解析为单词和标点符号

2ul0zpep  于 2023-05-01  发布在  Java
关注(0)|答案(2)|浏览(126)

我有一个字符串test=“hello how are you all doing,我希望它是好的!很好。期待见到你。
我试图使用javascript将字符串解析为单词和标点符号。我能够分开的话,但然后标点符号消失使用正则表达式
var result= test.匹配(/\B(\w|’)+I B/g);
所以我的期望输出是

hello
how 
are 
you
all
doing
,
I
hope
that
it's
good
!
and 
fine
.
Looking
forward
to
see
you
pkbketx9

pkbketx91#

简单方法

这第一种方法如果你,和javascript的定义“word”匹配。下面是一个更可定制的方法。
试试test.split(/\s*\b\s*/)它在单词边界(\b)上进行分割,并占用空白。

"hello how are you all doing, I hope that it's good! and fine. Looking forward to see you."
    .split(/\s*\b\s*/);
// Returns:
["hello",
"how",
"are",
"you",
"all",
"doing",
",",
"I",
"hope",
"that",
"it",
"'",
"s",
"good",
"!",
"and",
"fine",
".",
"Looking",
"forward",
"to",
"see",
"you",
"."]

工作原理。

var test = "This is. A test?"; // Test string.

// First consider splitting on word boundaries (\b).
test.split(/\b/); //=> ["This"," ","is",". ","A"," ","test","?"]
// This almost works but there is some unwanted whitespace.

// So we change the split regex to gobble the whitespace using \s*
test.split(/\s*\b\s*/) //=> ["This","is",".","A","test","?"]
// Now the whitespace is included in the separator
// and not included in the result.

更复杂的解决方案。

如果你希望像“isn`t”和“one-thousand”这样的词被当作一个词,而javascript regex把它们当作两个词,你需要创建自己的词定义。

test.match(/[\w-']+|[^\w\s]+/g) //=> ["This","is",".","A","test","?"]

工作原理

这匹配实际的话,标点符号字符分别使用交替.正则表达式[\w-']+的前半部分匹配您认为是单词的任何内容,后半部分[^\w\s]+匹配您认为是标点符号的任何内容。在这个例子中,我只是使用了任何不是单词或空格的东西。我也不过是一个+就结束了,这样多字符标点符号(如?!这是正确的书面!)被视为一个字符,如果你不想删除+

xeufq47z

xeufq47z2#

使用这个:

[,.!?;:]|\b[a-z']+\b

请参见the demo中的匹配。
例如,在JavaScript中:

resultArray = yourString.match(/[,.!?;:]|\b[a-z']+\b/ig);

说明

  • 字符类[,.!?;:]匹配括号内的一个字符
  • 或(交替|
  • \b匹配字边界
  • [a-z']+一个或多个字母或撇号
  • \b字边界

相关问题