linq 如果所有字词都在停用字词列表中，则删除这些字词

igsr9ssn 于 2022-12-06 发布在其他

关注(0)|答案(3)|浏览(293)

我有一个单词数组，它可以包含一个或多个单词。如果是一个单词，很容易删除它，但当选择删除多个单词时，如果它们都在停用词列表中，我很难弄清楚。我更喜欢用LINQ解决它。
想象一下，我有一个字符串数组

then use 
then he
the image
and the
should be in
should be written

我只想得到

then use 
the image
should be written

因此，应删除停用词中的all it words行，保留混词行。
我的停用词数组string[] stopWords = {"a", "an", "x", "y", "z", "this", "the", "me", "you", "our", "we", "I", "them", "then", "ours", "more", "will", "he", "she", "should", "be", "at", "on", "in", "has", "have", "and"};
谢谢你的好意，

linq

来源：https://stackoverflow.com/questions/74426744/remove-words-if-all-of-them-are-in-a-stop-words-list

3条答案

按热度按时间

eqqqjvef1#

解决此问题的一种方法是执行以下操作：

string[] stopWords = { "a", "an", "x", "y", "z", "this", "the", "me", "you", "our", "we", "I", "them", "ours", "more", "will", "he", "she", "should", "be", "at", "on", "in", "has", "have", "and" };

string input = """"
            then use 
            then he
            the image
            and the
            should be in
            should be written
            """";

var array = input.Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);

var filteredArray = array.Where(x => x.Split(' ').Any(y => !stopWords.Contains(y))).ToList();
var result = string.Join(Environment.NewLine, filteredArray);

Console.WriteLine(result);

前两行只是设置数据。
第三行通过在换行符上拆分，将字符串转换为行数组。（Environment.NewLine确保代码在linux上也能正常工作。）
第四行通过在空格上拆分行来处理每一行（这会得到单独的单词），然后检查是否有任何单词在stopWords列表中不存在。如果有任何单词不存在，则满足Where条件，并在filteredArray中返回整行。
第五行简单地连接所有单独的行以形成最终的result字符串。
结果应如下所示：

then use
then he
the image
should be written

请注意，在stopWords列表中，有单词them，但没有then，因此不应删除第二个结果行。

赞(0）回复(0）举报 2022-12-06

gajydyqb2#

使用“相交”方法，如下所示：

foreach (string word in WordsList)
    {
        List<string> splitData = word.Split(new string[] { " "}, StringSplitOptions.RemoveEmptyEntries).ToList();
        bool allOfWordsIsInStopWords = splitData.Intersect(stopWords).Count() == splitData.Count();
    }

赞(0）回复(0）举报 2022-12-06

qvk1mo1f3#

根据最初的问题描述：
我有一个单词数组，它可以包含一个或多个单词。如果只有一个单词，很容易删除它，但是当选择删除多个单词时，如果它们都在停用词列表中，我很难弄清楚。我更喜欢用LINQ解决它。
下列程式码会解析粗体的句子。

using System.Text.RegularExpressions;

string[] stopWords = { "a", "an", "x", "y", "z", "this", "the", "me", "you", "our", "we", "I", "them", "ours", "more", "will", "he", "she", "should", "be", "at", "on", "in", "has", "have", "and" };

string[] inputStrings = { "then use", "then he", "the image", "and the", "should be in", "should be written" };

var wordSeparatorPattern = new Regex(@"\s+");

var outputStrings = inputStrings.Where((words) => 
{
    return wordSeparatorPattern.Split(words).Any((word) =>
    {
        return !stopWords.Contains(word);
    });
});

foreach (var item in outputStrings)
{
    Console.WriteLine(item);
}

赞(0）回复(0）举报 2022-12-06

我来回答

linq 如果所有字词都在停用字词列表中，则删除这些字词

3条答案

相关问题

热门标签

最新问答