在python3中把字符串转换成字节的最佳方法是什么？

xxls0lw8 于 2021-07-13 发布在 Java

关注(0)|答案(4)|浏览(405)

将字符串转换为字节似乎有两种不同的方法，如typeerror的答案所示：“str”不支持缓冲区接口
这些方法中哪一种更好或更适合于肾盂？还是只是个人喜好的问题？

b = bytes(mystring, 'utf-8')

b = mystring.encode('utf-8')

python python-3.x String character-encoding

来源：https://stackoverflow.com/questions/67284849/typeerror-a-bytes-like-object-is-required-not-str

4条答案

按热度按时间

rn0zuynd1#

如果你看这些文件 bytes ，它指向 bytearray :
bytearray（[源[，编码[，错误]]）
返回新的字节数组。bytearray类型是范围0<=x<256的可变整数序列。它具有可变序列类型中描述的大多数可变序列的常用方法，以及bytes类型具有的大多数方法，请参见bytes和byte array方法。
可选的source参数可用于以几种不同的方式初始化数组：
如果它是一个字符串，您还必须给出编码（可选地，错误）参数；然后使用str.encode（）将字符串转换为字节。
如果它是一个整数，那么数组将具有该大小，并将使用空字节进行初始化。
如果是符合buffer接口的对象，则将使用该对象的只读缓冲区初始化字节数组。
如果它是一个iterable，那么它必须是0<=x<256范围内整数的iterable，这些整数用作数组的初始内容。
如果没有参数，将创建大小为0的数组。
所以呢 bytes 不仅仅是编码一个字符串。它是pythonic，它允许您使用任何类型的有意义的源参数调用构造函数。
对于编码字符串，我认为 some_string.encode(encoding) 它比使用构造函数更具python风格，因为它是最具自文档性的--“获取此字符串并使用此编码对其进行编码”比 bytes(some_string, encoding) --使用构造函数时没有显式动词。
编辑：我检查了python源代码。如果将unicode字符串传递给 bytes 它使用cpython调用pyunicode\u asencodedstring，这是 encode ; 所以你只是跳过了一个间接层次，如果你调用 encode 你自己。
另外，请参见serdalis的评论—— unicode_string.encode(encoding) 也更像Python，因为它的反面是 byte_string.decode(encoding) 对称性很好。

赞(0）回复(0）举报 2021-07-13

pbwdgjma2#

这比想象的要容易：

my_str = "hello world"
my_str_as_bytes = str.encode(my_str)
type(my_str_as_bytes) # ensure it is byte representation
my_decoded_str = my_str_as_bytes.decode()
type(my_decoded_str) # ensure it is string representation

赞(0）回复(0）举报 2021-07-13

0pizxfdo3#

绝对最好的方法不是2，而是第3个 encode 默认为 'utf-8' 从python3.0开始。因此最好的方法是

b = mystring.encode()

这也会更快，因为默认参数不会产生字符串 "utf-8" 在c代码中，但是 NULL ，检查起来要快得多！
以下是一些时间安排：

In [1]: %timeit -r 10 'abc'.encode('utf-8')
The slowest run took 38.07 times longer than the fastest. 
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 183 ns per loop

In [2]: %timeit -r 10 'abc'.encode()
The slowest run took 27.34 times longer than the fastest. 
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 137 ns per loop

尽管有警告，但在反复运行之后，时间非常稳定——偏差仅为~2%。
使用 encode() 如果没有参数，则与python 2不兼容，因为在python 2中，默认字符编码是ascii。

>>> 'äöä'.encode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

赞(0）回复(0）举报 2021-07-13

kwvwclae4#

回答稍微不同的问题：
已将原始unicode序列保存到str变量中：

s_str: str = "\x00\x01\x00\xc0\x01\x00\x00\x00\x04"

您需要能够获取该unicode的字节文本（对于struct.unpack（）等）

s_bytes: bytes = b'\x00\x01\x00\xc0\x01\x00\x00\x00\x04'

解决方案：

s_new: bytes = bytes(s, encoding="raw_unicode_escape")

参考（向上滚动标准编码）：
特定于python的编码

赞(0）回复(0）举报 2021-07-13

我来回答

在python3中把字符串转换成字节的最佳方法是什么？

4条答案

相关问题

热门标签

最新问答