regex 在C#中使用单个正则表达式捕获从字符串中提取标记

8fq7wneg 于 2023-06-25 发布在 C#

关注(0)|答案(3)|浏览(136)

我有一个字符串，其中包含一组要使用正则表达式进行匹配的标记。每个标签都用逗号分隔，标签可以包含空格、特殊字符甚至表情符号。

输入示例

#
tag1, tag with space, !@#%^, 🦄

预期输出

1.标签1
1.用空格标记
1.！@#%

🦄
我已经使用以下C#代码成功地提取了标签，但感觉很笨拙，因为它严重依赖于拆分和修剪：

var match = Regex.Match(input, @"^#[\n](?<tags>[\S ]+)$");
// if match is a success
var tags = match.Groups["tags"].Value.Split(',').Select(x => x.Trim());

我的目标是创建一个正则表达式，它允许我迭代捕获并直接提取标记，而不需要额外的字符串操作。
有没有一种方法可以编写这样一个正则表达式，可以在C#中整洁有效地提取这些标记？理想情况下，正则表达式应该修剪前导或尾随空格，并处理单个记录中的可变数据字段。

regex

来源：https://stackoverflow.com/questions/42615783/extracting-tags-from-a-string-using-a-single-regex-capture-in-c-sharp

3条答案

按热度按时间

qyzbxkaa1#

此作品(?ms)^\#\s+(?:\s*((?:(?!,|^\#\s+).)*?)\s*(?:,|$))+
它使用C#的 Capture Collection 来查找变量字段数据
在一个单一的记录。
您可以进一步扩展正则表达式，以便一次获取 * 所有 * 记录。
其中每个记录包含其自己的可变数量的字段数据。
正则表达式也有内置的修剪功能。
扩展：

(?ms)                   # Inline modifiers:  multi-line, dot-all
 ^ \# \s+                # Beginning of record
 (?:                     # Quantified group, 1 or more times, get all fields of record at once
      \s*                     # Trim leading wsp
      (                       # (1 start), # Capture collector for variable fields
           (?:                     # One char at a time, but not comma or begin of record
                (?!
                     , 
                  |  ^ \# \s+ 
                )
                .         
           )*?
      )                       # (1 end)
      \s* 
      (?: , | $ )             # End of this field, comma or EOL
 )+

C#代码：

string sOL = @"
#
tag1, tag with space, !@#%^, 🦄";

Regex RxOL = new Regex(@"(?ms)^\#\s+(?:\s*((?:(?!,|^\#\s+).)*?)\s*(?:,|$))+");
Match _mOL = RxOL.Match(sOL);
while (_mOL.Success)
{
    CaptureCollection ccOL1 = _mOL.Groups[1].Captures;
    Console.WriteLine("-------------------------");
    for (int i = 0; i < ccOL1.Count; i++)
        Console.WriteLine("  '{0}'", ccOL1[i].Value );
    _mOL = _mOL.NextMatch();
}

输出：

-------------------------
  'tag1'
  'tag with space'
  '!@#%^'
  '??'
  ''
Press any key to continue . . .

赞(0）回复(0）举报 2023-06-25

aij0ehis2#

作弊没有错;]

string input = @"#
tag1, tag with space, !@#%^, 🦄";

string[] tags = Array.ConvertAll(input.Split('\n').Last().Split(','), s => s.Trim());

赞(0）回复(0）举报 2023-06-25

pu3pd22g3#

你可以在没有regex的情况下完成它。就像这样分割：

var result = input.Split(new []{'\n','\r'}, StringSplitOptions.RemoveEmptyEntries).Skip(1).SelectMany(x=> x.Split(new []{','},StringSplitOptions.RemoveEmptyEntries).Select(y=> y.Trim()));

赞(0）回复(0）举报 2023-06-25

我来回答

regex 在C#中使用单个正则表达式捕获从字符串中提取标记

3条答案

相关问题

热门标签

最新问答