regex Java中的正则表达式与Scala中的正则表达式不同,相同的代码/逻辑,不同的输出

dfddblmv  于 2023-03-09  发布在  Java
关注(0)|答案(1)|浏览(127)

我正在转换现有的Scala代码,该代码使用正则表达式来检查输入中是否包含任何目标单词/短语/字符串。我知道存在containssubstring等辅助函数,但需要正则表达式解决方案。
下面是scala代码:

val NonCharacter: String = "[^\\p{L}\\p{M}]"
val targetWord = "phone"

var regexMiddle = s".*(${NonCharacter}${targetWord}${NonCharacter}).*"
var regexLeft = s"(${targetWord}${NonCharacter}).*"
var regexRight = s".*(${NonCharacter}${targetWord})"
var regexSame = s"(${targetWord})"
var exactRegex = s"${regexMiddle}|${regexLeft}|${regexRight}|${regexSame}"

"My phone is blue".matches(exactRegex) // returns true (EXPECTED)
"i have 1 phone".matches(exactRegex) // returns true (EXPECTED)
"phone2".matches(exactRegex)  // return false (EXPECTED)
"phone!".matches(exactRegex)  // return false (EXPECTED)
"phone ".matches(exactRegex)  // return false (EXPECTED)

下面是java代码:

String nonCharacter = "[^\\p{L}\\p{M}]";
String targetWord = "phone";

String regexMiddle = String.format(".*(%s%s%s).*", nonCharacter, targetWord, nonCharacter);
String regexLeft = String.format("(%s%s).*", targetWord, nonCharacter);
String regexRight = String.format(".*(%s%s)", nonCharacter, targetWord);
String regexSame = String.format("(%s)", targetWord);
String exactRegex = String.format("%s|%s|%s|%s", regexMiddle, regexLeft, regexRight, regexSame);

System.out.print("My phone is blue".matches(exactRegex)); // returns true (EXPECTED)
System.out.print("i have 1 phone".matches(exactRegex)); // returns true (EXPECTED)
System.out.print("phone2".matches(exactRegex)); // returns true (NOT EXPECTED)
System.out.print("phone!".matches(exactRegex)); // returns true (NOT EXPECTED)
System.out.print("phone ".matches(exactRegex)); // returns true (NOT EXPECTED)

你知道为什么存在这种差异以及如何处理看不见的边缘情况吗?

4c8rllxm

4c8rllxm1#

你的期望很奇怪:

"phone2".matches("phone[^\\p{L}\\p{M}]"); // returns true (NOT EXPECTED)
"phone!".matches("phone[^\\p{L}\\p{M}]"); // returns true (NOT EXPECTED)
"phone ".matches("phone[^\\p{L}\\p{M}]"); // returns true (NOT EXPECTED)

"2""!"" "-不是字母也不是标记。因此我期望true作为此类验证的结果。问题是-为什么Scala将这些字符视为字母或标记。
\p{L}或\p{字母}:任何语言的任何字母
\p{M}或\p{标记}:用于与另一个字符组合的字符(例如重音、变音、封闭框等)
资料来源:正则表达式信息。

相关问题