python pyspellchecker不识别拼写错误的单词

fivyi3re  于 2023-03-07  发布在  Python
关注(0)|答案(1)|浏览(206)

我最近安装了pyspellchecker,但它似乎无法正常工作。我该如何修复?

from spellchecker import SpellChecker
spell = SpellChecker()

spell.correction('hapenning')
# 'happenning' -> that's ok

spell.correction('helo')
# 'helo' -> not ok

spell.known(['adress'])
# {'adress'} -> not ok

以下是我的版本:

Python 3.8.10

pyspellchecker
Name: pyspellchecker
Version: 0.6.3
Summary: Pure python spell checker based on work by Peter Norvig
Home-page: https://github.com/barrust/pyspellchecker
Author: Tyler Barrus
Author-email: barrust@gmail.com
License: MIT
Location: /home/edmz/.local/lib/python3.8/site-packages
Requires: 
Required-by:
c3frrgcw

c3frrgcw1#

我将假设您为这些拼写错误的单词寻找的正确答案:

  • 直升机
  • 地址

将是:

  • 你好
  • 地址

但你会得到:

  • 帮助
  • 服装

这是因为spell.correction(word) * 返回拼写错误的单词 * 的最可能结果

from spellchecker import SpellChecker

spell = SpellChecker(language='en', distance=2)
words = ['hapenning', 'helo', 'adress']
for word in words:
    print(f'Original word: {word} ---- Corrected word: {spell.correction(word)}')
    
    # output
    Original word: hapenning ---- Corrected word: happening
    Original word: helo ---- Corrected word: help
    Original word: adress ---- Corrected word: dress

如果我们改变distance' variable to 1.'我们会得到:

Original word: hapenning ---- Corrected word: None
Original word: helo ---- Corrected word: help
Original word: adress ---- Corrected word: dress

如果查看pyspellchecker的代码,您会发现它在使用spell.correction时调用此函数

def edit_distance_1(self, word: KeyT) -> typing.Set[str]:
        """Compute all strings that are one edit away from `word` using only
        the letters in the corpus

        Args:
            word (str): The word for which to calculate the edit distance
        Returns:
            set: The set of strings that are edit distance one from the provided word"""
        tmp_word = ensure_unicode(word).lower() if not self._case_sensitive else ensure_unicode(word)
        if self._check_if_should_check(tmp_word) is False:
            return {tmp_word}
        letters = self._word_frequency.letters
        splits = [(tmp_word[:i], tmp_word[i:]) for i in range(len(tmp_word) + 1)]
        deletes = [L + R[1:] for L, R in splits if R]
        transposes = [L + R[1] + R[0] + R[2:] for L, R in splits if len(R) > 1]
        replaces = [L + c + R[1:] for L, R in splits if R for c in letters]
        inserts = [L + c + R for L, R in splits for c in letters]
        return set(deletes + transposes + replaces + inserts)

我们可以直接调用这个函数。

from spellchecker import SpellChecker

spell = SpellChecker(language='en', distance=2)
print(spell.edit_distance_1('helo'))

所产生的单词列表有243种变体:

{'hpelo', 'heho', 'ehelo', 'hejo', 'heli', 'helg', 'ihelo', 'bhelo', 'helof', 'heloc', 'haelo', 'helov', 'xhelo', 'heeo', 'helol', 'hilo', 'helfo', 'velo', 'mhelo', 'heclo', 'helop', 'helh', 'hevlo', 'hulo', 'hkelo', 'heklo', 'helj', 'hxlo', 'heylo', 'lelo', 'hwlo', 'helq', 'ielo', 'hebo', 'helt', 'chelo', 'hnlo', 'hexo', 'heqo', 'heljo', 'heloa', 'helw', "'elo", 'htlo', 'hewo', 'hexlo', 'hemo', 'hselo', 'heio', 'hrelo', 'hejlo', 'thelo', 'hhlo', 'heoo', 'rhelo', 'hcelo', 'elo', 'hmlo', 'healo', "'helo", 'heloz', 'hxelo', 'uelo', 'oelo', 'hqlo', 'hllo', 'heuo', 'hvlo', 'helb', 'htelo', 'hello', 'helso', 'heloe', 'heco', 'heldo', 'held', 'hlo', 'pelo', 'selo', 'aelo', 'hel', 'hnelo', 'heplo', 'helor', 'heloi', 'hjelo', 'dhelo', 'uhelo', 'heleo', 'hoelo', 'hell', 'heqlo', 'ehlo', 'hyelo', 'hleo', 'helc', 'helvo', "hel'o", 'zelo', 'hplo', 'hrlo', 'hefo', 'heto', 'hely', 'hela', 'ahelo', 'khelo', 'hvelo', 'hmelo', 'helwo', 'heelo', 'helto', 'nelo', 'hedlo', 'heloo', 'helqo', 'helog', 'helm', 'helf', "h'elo", 'hblo', 'hezlo', 'hylo', 'gelo', 'halo', 'helr', 'heo', 'heblo', 'heloq', 'hslo', 'helxo', 'hero', 'heao', "he'lo", 'helon', 'heno', 'helom', 'whelo', 'helmo', 'hwelo', 'holo', 'hezo', 'heslo', 'heln', 'hetlo', 'telo', 'hdelo', 'heso', 'hlelo', 'helbo', 'helio', 'heyo', 'helot', 'hego', "helo'", 'hemlo', 'helob', "hel'", 'relo', 'hhelo', "h'lo", 'hqelo', 'welo', 'hglo', 'fhelo', 'hels', 'hepo', 'shelo', 'heulo', 'helco', 'helko', 'heol', 'qelo', 'helno', 'heloy', 'helo', 'yelo', 'hdlo', 'helx', 'vhelo', 'hfelo', 'helou', 'heloj', 'helro', 'jelo', 'hzlo', 'helzo', 'henlo', 'helox', "he'o", 'xelo', 'phelo', 'helow', 'ghelo', 'hjlo', 'heluo', 'hgelo', 'hele', 'hzelo', 'hklo', 'heflo', 'felo', 'kelo', 'helod', 'helos', 'delo', 'helok', 'helho', 'huelo', 'hielo', 'heko', 'helgo', 'nhelo', 'help', 'heolo', 'helu', 'heilo', 'hewlo', 'hedo', 'ohelo', 'heglo', 'helk', 'zhelo', 'helyo', 'hflo', 'eelo', 'melo', 'hclo', 'yhelo', 'heloh', 'jhelo', 'herlo', 'helv', 'hehlo', 'qhelo', 'celo', 'belo', 'lhelo', 'hbelo', 'hevo', 'helpo', 'helao', 'helz'}

这些单词中的绝大多数在《英语词典》中都找不到。
如果我们不再使用函数spell.correction(word),而是使用函数spell.candidates(word),您可以看到spell.edit_distance_1(word)生成的所有有效字

from spellchecker import SpellChecker

spell = SpellChecker(language='en', distance=2)
words = ['hapenning', 'helo', 'adress']
for word in words:
    print(spell.candidates(word))
    
    # output
    {'henning', 'happening', 'penning'}
    {'pelo', 'hel', 'helm', 'holo', 'heli', 'helos', 'hello', 'halo', 'hell', 'help', 'held', 'hero'}
    {'dress', 'address'}

因此,对于您的输入单词helo,有12个可能的候选项。让我们记住,spell.correction(word) * 返回拼写错误单词 * 的最可能结果。这是基于文章How to Write a Spelling Corrector中讨论的Probability Theorypyspellchecker是基于本文设计的。
我希望我的回答能让您了解为什么helo变成了help而不是hello

相关问题