检测Python中属于哪个字母字符

biswetbf 于 12个月前发布在 Python

关注(0)|答案(2)|浏览(79)

有没有一个库或其他简单的方法来检测Python中属于哪个字母表字符？我知道我可以使用unicode代码范围来实现这一点，但是如果已经有一个内置的方法或库或一些提供Map的方法，我宁愿不重新发明轮子。
注意：我问的是alphabet而不是language。“hello”和“hola”都将Map到拉丁字母，而“hello”将Map到西里尔字母。

python

来源：https://stackoverflow.com/questions/28756796/detecting-which-alphabet-characters-belong-to-in-python

2条答案

按热度按时间

lmyy7pcs1#

Python的unicodedata和this question/answer一样，在这里非常有用
如果不编写一个完整的模块，我找不到任何简单的方法来检测一种语言，我想我会遇到很多极端情况，所以我编写了一个库。Github页面是here。有了它，你可以：

pip install alphabet-detector

字符串
然后直接使用它：

from alphabet_detector import AlphabetDetector
ad = AlphabetDetector()

ad.only_alphabet_chars(u"ελληνικά means greek", "LATIN") #False
ad.only_alphabet_chars(u"ελληνικά", "GREEK") #True
ad.only_alphabet_chars(u"frappé", "LATIN") #True
ad.only_alphabet_chars(u"hôtel lœwe", "LATIN") #True
ad.only_alphabet_chars(u"123 ångstrom ð áß", "LATIN") #True
ad.only_alphabet_chars(u"russian: гага", "LATIN") #False
ad.only_alphabet_chars(u"гага", "CYRILLIC") #True

型
我还为主要语言编写了几个方便的方法：

ad.is_cyrillic(u"гага") #True  
ad.is_latin(u"howdy") #True
ad.is_cjk(u"hi") #False
ad.is_cjk(u'汉字') #True

型

赞(0）回复(0）举报 12个月前

jv4diomz2#

我能找到的最接近解决这个问题的方法是使用https://pypi.org/project/uniscripts/，它已经多年没有更新了，但是通过从unicode标准中提取脚本，它有正确的方法。
我更新了uniscripts到unicode 15.1，并向包维护者提交了一个合并请求。同时你可以从my repository:使用它。

pip install git+https://github.com/gaspardpetit/uniscripts.git

字符串
然后：

from uniscripts import is_script, Scripts
>>> is_script(u"ελληνικά means greek", Scripts.LATIN)
False

>>> is_script(u"ελληνικά", Scripts.GREEK)
True

>>> is_script(u"гага", Scripts.CYRILLIC)
True

型
alphabet-detector对我来说是不可靠的，因为它返回字符名称的第一个单词，这是 * 经常 * 脚本名称，但不总是。例如：

>>> from alphabet_detector import AlphabetDetector
>>> ad = AlphabetDetector()
>>> ad.detect_alphabet("𐲌")
{'OLD'}

>>> ad.detect_alphabet("º")
{'MASCULINE'}

型
uniscripts则正确返回：

>>> from uniscripts import get_scripts
>>> get_scripts("𐲌")
{'Old_Hungarian'}

>>> get_scripts("º")
{'Latin', 'Common'}

型

赞(0）回复(0）举报 12个月前

我来回答

检测Python中属于哪个字母字符

2条答案

相关问题

热门标签

最新问答