regex 在python中查找一个字符串中的所有unicodes匹配项

pgky5nke 于 12个月前发布在 Python

关注(0)|答案(1)|浏览(121)

import re

b="united thats weak. See ya 👋"
print b.decode('utf-8')  #output: u'united thats weak. See ya \U0001f44b'

print re.findall(r'[\U0001f600-\U0001f650]',b.decode('utf-8'),flags=re.U) # output: [u'S']

如何获得输出\U0001f44b。
我需要处理的问题是

😀_❤️_😁_😂_😃_😄_😅_😆_😇_😈_😉_😊_😋_😌_😍_😎_😏_😐_😑_😒_😓_😔_😕_😖_😗_😘_😙_😚_😛_😜_😝_😞_😟_😠_😡_😢_😣_😤_😥_😦_😧_😨_😩_😪_😫_😬_😭_😮_😯_😰_😱_😲_😳_😴_😵_😶_😷_😸_😹_😺_😻_😼_😽_😾_😿_🙀_🙁_🙂_🙃_🙄_🙅_🙆_🙇_🙈_🙉_🙊_🙋_🙌_🙍_🙎_🙏_🚀_🚁_🚂_🚃_🚄_🚅_🚆_🚇_🚈_🚉_🚊_🚋_🚌_🚍_🚎_🚏_🚐_🚑_🚒_🚓_🚔_🚕_🚖_🚗_🚘_🚙_🚚_🚛_🚜_🚝_🚞_🚟_🚠_🚡_🚢_🚣_🚤_🚥_🚦_🚧_🚨_🚩_🚪_🚫_🚬_🚭_🚮_🚯_🚰_🚱_🚲_🚳_🚴_🚵_🚶_🚷_🚸_🚹_🚺_🚻_🚼_🚽_🚾_🚿_🛀_🛁_🛂_🛃_🛄_🛅_🛋_🛌_🛍_🛎_🛏_🛐_🛠_🛡_🛢_🛣_🛤_🛥_🛩_🛫_🛬_🛰_🛳_🤐_🤑_🤒_🤓_🤔_🤕_🤖_🤗_🤘_🦀_🦁_🦂_🦃_🦄_🧀

regex

来源：https://stackoverflow.com/questions/40902662/find-all-the-matches-for-unicodes-in-a-string-in-python

1条答案

按热度按时间

svujldwt1#

搜索unicode范围与搜索任何类型的字符范围完全相同。但是，您需要正确地表示字符串。下面是一个工作示例：

#coding: utf-8
import re

b=u"united thats weak. See ya 😇 "
assert re.findall(u'[\U0001f600-\U0001f650]',b) == [u'😇']
assert re.findall(ur'[😀-🙏]',b) == [u'😇']

备注：

在程序的第一行或第二行需要#coding: utf-8或类似的值。
在你的例子中，你使用的表情符号U-1f 44 b不在U-1f 600到U-1f 650的范围内。在我的例子中，我使用了一个。
如果你想使用\U来包含一个unicode字符，你不能使用原始字符串前缀（r''）。
但是如果使用字符本身（而不是\U转义），那么可以使用原始字符串前缀。
您需要确保模式和输入字符串都是unicode字符串。它们都不能是UTF8编码的字符串。
但是您不需要re.U标志，除非您的模式包含\s、\w或类似的内容。

赞(0）回复(0）举报 12个月前

我来回答

regex 在python中查找一个字符串中的所有unicodes匹配项

1条答案

相关问题

热门标签

最新问答