我正在转换现有的Scala代码,该代码使用正则表达式来检查输入中是否包含任何目标单词/短语/字符串。我知道存在contains
或substring
等辅助函数,但需要正则表达式解决方案。
下面是scala代码:
val NonCharacter: String = "[^\\p{L}\\p{M}]"
val targetWord = "phone"
var regexMiddle = s".*(${NonCharacter}${targetWord}${NonCharacter}).*"
var regexLeft = s"(${targetWord}${NonCharacter}).*"
var regexRight = s".*(${NonCharacter}${targetWord})"
var regexSame = s"(${targetWord})"
var exactRegex = s"${regexMiddle}|${regexLeft}|${regexRight}|${regexSame}"
"My phone is blue".matches(exactRegex) // returns true (EXPECTED)
"i have 1 phone".matches(exactRegex) // returns true (EXPECTED)
"phone2".matches(exactRegex) // return false (EXPECTED)
"phone!".matches(exactRegex) // return false (EXPECTED)
"phone ".matches(exactRegex) // return false (EXPECTED)
下面是java代码:
String nonCharacter = "[^\\p{L}\\p{M}]";
String targetWord = "phone";
String regexMiddle = String.format(".*(%s%s%s).*", nonCharacter, targetWord, nonCharacter);
String regexLeft = String.format("(%s%s).*", targetWord, nonCharacter);
String regexRight = String.format(".*(%s%s)", nonCharacter, targetWord);
String regexSame = String.format("(%s)", targetWord);
String exactRegex = String.format("%s|%s|%s|%s", regexMiddle, regexLeft, regexRight, regexSame);
System.out.print("My phone is blue".matches(exactRegex)); // returns true (EXPECTED)
System.out.print("i have 1 phone".matches(exactRegex)); // returns true (EXPECTED)
System.out.print("phone2".matches(exactRegex)); // returns true (NOT EXPECTED)
System.out.print("phone!".matches(exactRegex)); // returns true (NOT EXPECTED)
System.out.print("phone ".matches(exactRegex)); // returns true (NOT EXPECTED)
你知道为什么存在这种差异以及如何处理看不见的边缘情况吗?
1条答案
按热度按时间4c8rllxm1#
你的期望很奇怪:
"2"
,"!"
," "
-不是字母也不是标记。因此我期望true
作为此类验证的结果。问题是-为什么Scala将这些字符视为字母或标记。\p{L}或\p{字母}:任何语言的任何字母
\p{M}或\p{标记}:用于与另一个字符组合的字符(例如重音、变音、封闭框等)
资料来源:正则表达式信息。