regex 如何在Java中获取匹配中的正则表达式命名捕获组的名称?

uxhixvfz  于 2023-06-07  发布在  Java
关注(0)|答案(3)|浏览(225)

给出:

String text = "FACEBOOK is buying GOOGLE and FACE BOOK";

以及:

Pattern pattern = Pattern.compile("(?<FB>(FACE(\\p{Space}?)BOOK))|(?<GOOGL>(GOOGL(E)?))");
Matcher matcher = pattern.matcher(text);

我想得到这样的东西:

Group=FB matches substring="FACEBOOK" at position=[0, 8)
Group=GOOGL matches substring="GOOGLE" at position=[19, 25)
Group=FB matches substring="FACE BOOK" at position=[30, 39)

但是,我一直无法得到组名。以下是我在Scala中的尝试:

import java.util.regex.Pattern
  val pattern = Pattern.compile("(?<FB>(FACE(\\p{Space}?)BOOK))|(?<GOOGL>(GOOGL(E)?))")
  val text = "FACEBOOK is buying GOOGLE and FACE BOOK"
  val matcher = pattern.matcher(text)

  while(matcher.find()) {
    println(s"Group=???? matches substring=${matcher.group()} at position=[${matcher.start},${matcher.end})")
  }

编辑:有人将此标记为Get group names in java regex的重复,但这是一个不同的问题。这是问给定一个MATCH,如何找到组名。另一个问题是询问如何在给定Pattern对象的情况下将组名变为String(或index)。

gfttwv5a

gfttwv5a1#

您可以使用named-regexp Java库。它是一个围绕java.util.regex的瘦 Package 器,支持命名捕获组,主要针对Java-7之前的用户,但它也包含检查组名称的方法(即使在Java 11中也似乎缺少):

  • 模式#groupNames
  • 匹配器#namedGroups
xqnpmsa8

xqnpmsa82#

以下是我在Scala中的尝试:

import java.util.regex.{MatchResult, Pattern}

class GroupNamedRegex(pattern: Pattern, namedGroups: Set[String]) {
  def this(regex: String) = this(Pattern.compile(regex), 
    "\\(\\?<([a-zA-Z][a-zA-Z0-9]*)>".r.findAllMatchIn(regex).map(_.group(1)).toSet)

  def findNamedMatches(s: String): Iterator[GroupNamedRegex.Match] = new Iterator[GroupNamedRegex.Match] {
    private[this] val m = pattern.matcher(s)
    private[this] var _hasNext = m.find()

    override def hasNext = _hasNext

    override def next() = {
      val ans = GroupNamedRegex.Match(m.toMatchResult, namedGroups.find(group => m.group(group) != null))
      _hasNext = m.find()
      ans
    }
  }
}

object GroupNamedRegex extends App {
  case class Match(result: MatchResult, groupName: Option[String])

  val r = new GroupNamedRegex("(?<FB>(FACE(\\p{Space}?)BOOK))|(?<GOOGL>(GOOGL(E)?))")
  println(r.findNamedMatches("FACEBOOK is buying GOOGLE and FACE BOOK FB").map(s => s.groupName -> s.result.group()).toList)
}
zbq4xfa0

zbq4xfa03#

Java 20正在将namedGroups方法添加到MatchResultMatcher实现)。在您的示例中,这可以用于获取当前匹配组名称。
下面是一个Java实现:

while(matcher.find()) {
    System.out.printf("Group=%s matches substring=%s at position=[%s,%s)%n",
            getCurrentGroupName(matcher), 
            matcher.group(), matcher.start(), matcher.end());
}
private static String getCurrentGroupName(Matcher matcher) {
    return matcher.namedGroups().keySet().stream()
            .filter(n -> matcher.group(n) != null)
            .findFirst().orElse(null);
}

相关问题