pandas 删除不在词典中的单词词典

eqoofvh9 于 2022-12-09 发布在其他

关注(0)|答案(4)|浏览(197)

我有一个数据表，其中包含来自在线评论的单词元组。它包含太多错别字，因此我尝试删除不属于该词典的单词。我尝试使用的词典是KBBI（印度尼西亚语词典）https://pypi.org/project/kbbi/，从...

pip install kbbi
from kbbi import KBBI

我有麻烦匹配我的数据与字典，因为我不熟悉它的数据类型。函数我发现从原来的资源显示，它允许我们搜索一个词，它将返回的定义。我将只需要在字典内搜索（或者可能其他方式是提取所有文本内的txt文件字典）。这里是一个输入的例子...

# trying to look for "anjing" in the dictionary. Anjing is Indonesian for dog.    
anjing = KBBI('anjing')
print (anjing)

其输出

an.jing
1. (n)  mamalia yang biasa dipelihara untuk menjaga rumah, berburu, dan sebagainya 〔Canis familiaris〕
2. (n)  anjing yang biasa dipelihara untuk menjaga rumah, berburu, dan sebagainya 〔Canis familiaris〕

def remove_typo(text):
    text = [word for word in text if word in KBBI]
    return text

df['after'] = df['before'].apply(lambda x: remove_typo(x))

我在第二行遇到一个错误，说“'type'类型的参数不可迭代”。

pandas

来源：https://stackoverflow.com/questions/74739534/remove-word-not-in-dictionary-dictionary

4条答案

按热度按时间

kpbwa7wx1#

我检查了kbbi的文档，解决方案变更为try-except：

from kbbi import KBBI, TidakDitemukann 

L = [['masih', 'blom', 'cair', 'jugagmn', 'in'], 
     ['alhmdllh', 'sangat', 'membantu', 'meski', 'bunga', 'cukup', 'besar']]

df = pd.DataFrame({'before':L})

def remove_typo(text):
    out = []
    for word in text:
        try:
            if KBBI (word):
                out.append(word)
        except TidakDitemukan:
                pass
    return out

df['after'] = df['before'].apply(remove_typo)

print (df)
                                              before  \
0                   [masih, blom, cair, jugagmn, in]   
1  [alhmdllh, sangat, membantu, meski, bunga, cuk...   

                                            after  
0                                   [masih, cair]  
1  [sangat, membantu, meski, bunga, cukup, besar]

赞(0）回复(0）举报 2022-12-09

fivyi3re2#

text=[如果是BKKI中的单词，则为文本中的单词]

赞(0）回复(0）举报 2022-12-09

z4iuyo4d3#

首先，确保您确实应该使用word in KBBI而不是word in table。
如果这是正确的，则错误来自Series，您可以修改函数，使其在值不正确时立即返回：

def remove_typo(text):
    if isinstance(text, list): 
        text = [word for word in text if word in KBBI] # should this be "table"?
        # text = [word for word in text if word in table]
    return text

df['after'] = df['before'].apply(remove_typo)

或使用例外状况：

def remove_typo(text):
    try: 
        return [word for word in text if word in KBBI]
    except ValueError: # use the correct error here 
        return text

df['after'] = df['before'].apply(remove_typo)

赞(0）回复(0）举报 2022-12-09

vatpfxk54#

第一个

赞(0）回复(0）举报 2022-12-09

我来回答

pandas 删除不在词典中的单词词典

4条答案

相关问题

热门标签

最新问答