.net C#中的一行文本中的单词比较

tv6aics1  于 12个月前  发布在  .NET
关注(0)|答案(2)|浏览(96)

嗨,我在我的项目中使用C#语言,我试图得到如下输出。

string str1 = "Cat meet's a dog has";
 string str2 = "Cat meet's a dog and a bird";

 string[] str1Words = str1.ToLower().Split(' ');
 string[] str2Words = str2.ToLower().Split(' ');

 var uniqueWords = str2Words
   .Except(str1Words)
   .Concat(str1Words.Except(str2Words))
   .ToList();

这给了我put has,and,a,bird,这是正确的,但我想要的是下面这样的东西
has -存在于第一个字符串中而不存在于第二个字符串中
还有一只鸟--不是第一串,而是第二串
例如,第二个用户案例

String S1 = "Added"
String S2 = "Edited"

这里提出的应该是
添加-存在于第一个字符串中而不存在于第二个字符串中
编辑-不存在于第一个字符串中,但存在于第二个字符串中
我希望有一些迹象,出现在第一个而不是第二个,出现在第二个而不是第一个,比较应该是逐字逐句,而不是逐字逐句。有人能帮我一下吗。如果你能帮忙的话,我将不胜感激。谢谢

cs7cruho

cs7cruho1#

我建议用匹配的词
设word是字母和撇号的序列
在 * 正则表达式 * 的帮助下(请注意,拆分不考虑标点符号,因此catcat,cat!将被视为三个不同的单词),然后查询两个给定字符串的匹配:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions; 

...

private static readonly Regex WordsRegex = new Regex(@"[\p{L}']+"); 

// 1 - in text1, 2 - in text2, 3 - in both text1 and text2 
private static List<(string word, int presentAt)> MyWords(string text1, string text2) {
  HashSet<string> words1 = WordsRegex
    .Matches(text1)
    .Cast<Match>()
    .Select(match => match.Value)
    .ToHashSet(StringComparer.OrdinalIgnoreCase);

  HashSet<string> words2 = WordsRegex
    .Matches(text2)
    .Cast<Match>()
    .Select(match => match.Value)
    .ToHashSet(StringComparer.OrdinalIgnoreCase);

  return words1
    .Union(words2)
    .Select(word => (word, presentAt: (words1.Contains(word) ? 1 : 0) | 
                                      (words2.Contains(word) ? 2 : 0)))
    .ToList();
}

演示:

string str1 = "Cat meet's a dog has";
string str2 = "Cat meet's a dog and a bird";
    
var result = MyWords(str1, str2);
    
var report = string.Join(Environment.NewLine, result);
    
Console.Write(report);

输出量:

(Cat, 3)         # 3: in both str1 and str2 
(meet's, 3)      # 3: in both str1 and str2
(a, 3)           # 3: in both str1 and str2
(dog, 3)         # 3: in both str1 and str2 
(has, 1)         # 1: in str1 only
(and, 2)         # 2: in str2 only
(bird, 2)        # 2: in str2 only

Fiddle
如果你想要一个冗长的输出:

string str1 = "Cat meet's a dog has";
string str2 = "Cat meet's a dog and a bird";
    
string[] options = new string[] {
  "not present",
  "present in first string not present in second string",
  "not present in first string but present in second string",
  "present in first string and present in second string"
};
        
var report = string.Join(Environment.NewLine, result
  .Select(pair => $"{pair.word} - {options[pair.presentAt]}"));

Console.Write(report);

输出量:

Cat - present in first string and present in second string
meet's - present in first string and present in second string
a - present in first string and present in second string
dog - present in first string and present in second string
has - present in first string not present in second string
and - not present in first string but present in second string
bird - not present in first string but present in second string
c3frrgcw

c3frrgcw2#

str2Words.Except(str1Words)

查找str2Words中不存在于str1Words中的单词。

str1Words.Except(str2Words)

查找str1Words中不存在于str2Words中的单词。
由于您需要分别使用这两个结果,因此需要避免将它们连接在一起,而是在每个结果上使用Join以获得空格分隔的结果,并附加您为它们计划的“present”附录。

相关问题