regex 正则表达式匹配包含日语和英语字符的字符串

k7fdbhmy 于 2023-06-25 发布在其他

关注(0)|答案(1)|浏览(104)

我在PowerShell中有这个脚本，我最终将使用它来翻译一个XML文件，其中包含一些日语单词并替换为英语。现在这是一个简单的正则表达式匹配示例：

$pattern = "(?<=\>)[\p{IsHiragana}\p{IsKatakana}\p{IsCJKUnifiedIdeographs}]+(?=\<\/)"
$text = 'tag3>日本語</tag>漢字</tag>.'

$matches = $text | Select-String -Pattern $pattern -AllMatches | ForEach-Object { $_.Matches.Value }

$matches

这工作正常，并将返回以下内容：

日本語
漢字

然而，我希望它也抓住或更多的英文字符之前或之后的日本字符，和整个事情之间的 Package >和</
对于此字符串：

tag3>Some text before 日本語 and some text after</tag><Before text 漢字</tag>

它应该抓住这些：

Some text before 日本語 and some text after
Before text 漢字

regex

来源：https://stackoverflow.com/questions/76452529/regex-matching-strings-with-mixture-of-japanese-and-english-characters

1条答案

按热度按时间

r9f1avp51#

强制性一般性建议：

String 最好避免解析XML文本，因为它是inherently limited and brittle;最好使用 * 专用的XML解析器*，例如.NET的System.Xml.XmlDocument类，PowerShell通过其[xml]类型加速器和 * 基于属性 * 的XML DOM适配提供了轻松访问;例如参见this answer。

您可以按如下方式优化regex：

$pattern = '(?<=[^/]>)[^>\P{IsBasicLatin}]*[\p{IsHiragana}\p{IsKatakana}\p{IsCJKUnifiedIdeographs}]+[^>\P{IsBasicLatin}]*(?=</)'

$text = '<tag3>Some text before 日本語 and some text after</tag3><tag>Before text 漢字</tag>.'

# Outputs directly to the console for diagnostic purposes.
$text |
  Select-String -Pattern $pattern -AllMatches |
  ForEach-Object { $_.Matches.Value }

输出：

Some text before 日本語 and some text after
Before text 漢字

有关正则表达式的解释和使用它的能力，请参见this regex101.com page。

赞(0）回复(0）举报 2023-06-25

我来回答

regex 正则表达式匹配包含日语和英语字符的字符串

1条答案

相关问题

热门标签

最新问答