regex 提取引号之间的所有内容，分号分隔符除外

exdqitrt 于 2023-04-13 发布在其他

关注(0)|答案(3)|浏览(176)

我正在寻找一个正则表达式，将提取引号之间的一切标记，不包括分号分隔符。
预期结果

azerty
y9uiuih
qwsdf
ftyg hjhh
w__g_-91

从

-from:"azerty;y9uiuih;qwsdf"
-to:"ftyg hjhh;w__g_-91"

我尝试了(?![[:alnum:]]+:)([[:alnum:]]+| |-|_)+
规则：

它们可以在0和引号之间的无限数量的;之间，每个都将是两个单词之间的分隔符
每行只有一对"
引号之间的单词可以用除;和"以外的任何字符构成

regex

来源：https://stackoverflow.com/questions/75957582/extract-everything-between-quotes-excluding-the-semicolon-separator

3条答案

按热度按时间

j0pj023g1#

从示例输出中可以看出，您需要

Match“单词”
1.在每个“单词”中，将;更改为新行。
验证码：

string text = "-from:\"azerty;y9uiuih;qwsdf\"\r\n- to:\"ftyg hjhh;w__g_-91\"";

Regex regex = new Regex("\"[^\"]*\"");

string result = string.Join(Environment.NewLine, regex
  .Matches(text)
  .Select(match => match.Value.Trim('"').Replace(";", Environment.NewLine)));

Console.Write(result);

如果您不想将;更改为一个新行，而是想Split：

string text = "-from:\"azerty;y9uiuih;qwsdf\"\r\n- to:\"ftyg hjhh;w__g_-91\"";

Regex regex = new Regex("\"[^\"]*\"");

string[] words = regex
  .Matches(text)
  .SelectMany(match => match.Value.Trim('"').Split(';'))
  .ToArray();

Console.Write(string.Join(Environment.NewLine, words));

赞(0）回复(0）举报 2023-04-13

u3r8eeie2#

如果你只想在C#中进行匹配，你可以使用lookahead和lookbehind：

(?<=""[^""\r\n]*)[^;\r\n""]+(?=[^""\r\n]*"")

说明

(?<=""[^""\r\n]*)将"置于左侧，匹配除"以外的可选字符或中间的换行符
[^;\r\n""]+匹配1+除;和"以外的字符或换行符
(?=[^""\r\n]*"")向右Assert"，匹配除"以外的可选字符或中间的换行符

.NET Regex demo|C# demo
C#示例：

string pattern = @"(?<=""[^""\r\n]*)[^;\r\n""]+(?=[^""\r\n]*"")";
string input = @"-from:""azerty;y9uiuih;qwsdf""
-to:""ftyg hjhh;w__g_-91""";

foreach (Match m in Regex.Matches(input, pattern))
{
    Console.WriteLine(m.Value);
}

输出

azerty
y9uiuih
qwsdf
ftyg hjhh
w__g_-91

如果匹配应该至少以单词字符开始（例如，不匹配仅空格）并允许示例中的字符：

(?<=""[^""\r\n]*)[\p{Zs}\t]*\w[\w\p{Zs}\t-]*(?=[^""\r\n]*"")

Regex demo

赞(0）回复(0）举报 2023-04-13

ulmd4ohb3#

(?<title>\w+):"((?<word>[^;"]+);?)+(?="$)
此模式捕获标题（“from”/“to”）以及具有命名组的单词。
给出：

var input = """
    -from:"azerty;y9uiuih;qwsdf"
    -to:"ftyg hjhh;w__g_-91"
    """;
var pattern = """(?<title>\w+):"((?<word>[^;"]+);?)+(?="$)""";

我们可以这样写：

Regex r = new Regex(pattern, RegexOptions.Multiline);       

var matches = r.Matches(input);
foreach (Match m in matches)
{
    string title =  m.Groups["title"].Value;
    foreach(Capture c in m.Groups["word"].Captures)
    {
        string word = c.Value;
        Console.WriteLine($"{title}: {word}");
    }
}

图纸：

from: azerty
from: y9uiuih
from: qwsdf
to: ftyg hjhh
to: w__g_-91

请注意，Captures返回一组重复的内容。在本例中，是在引号之间捕获的单词。
说明：

(?<name>pattern)将模式捕获到命名组中。
\w+标题由至少一个单词字符组成。
(?<title>\w+):标题后接分号。
[^;"]+单词由至少一个不是分号或双引号的字符组成。
((?<word>[^;"]+);?)+单词后面可以跟一个可选的分号（;?），并且至少重复一次（+）。
pattern(?=postfix)模式后面必须有后缀。
"$后缀是一个双引号，后面跟着一个行尾。注意，我们必须指定正则表达式选项Multiline，让^和$表示行的开始和结束，而不是整个字符串的开始和结束。

赞(0）回复(0）举报 2023-04-13

我来回答

regex 提取引号之间的所有内容，分号分隔符除外

3条答案

相关问题

热门标签

最新问答