我读过有关“https://github.com/golang/go/issues/25484“的文章,其中介绍了从[]byte
到string
的无拷贝转换。
我想知道是否有一种方法可以将一个字符串转换为一个字节片,而不需要内存复制?
我正在写一个处理terra-bytes数据的程序,如果每个字符串在内存中复制两次,会减慢进程。我不关心可变/不安全,只关心内部使用,我只需要速度越快越好。
示例:
var s string
// some processing on s, for some reasons, I must use string here
// ...
// then output to a writer
gzipWriter.Write([]byte(s)) // !!! Here I want to avoid the memory copy, no WriteString
所以问题是:有没有防止内存复制的方法?2我知道我可能需要不安全的软件包,但是我不知道怎么做。3我已经搜索了一段时间,到现在还没有答案,SO显示的相关答案也不起作用。
7条答案
按热度按时间lmyy7pcs1#
Getting the content of a
string
as a[]byte
without copying in general is only possible usingunsafe
, becausestring
s in Go are immutable, and without a copy it would be possible to modify the contents of thestring
(by changing the elements of the byte slice).So using
unsafe
, this is how it could look like (corrected, working solution):This solution is from Ian Lance Taylor .
One thing to note here: the empty string
""
has no bytes as its length is zero. This means there is no guarantee what theData
field may be, it may be zero or an arbitrary address shared among the zero-size variables. If an empty string may be passed, that must be checked explicitly (although there's no need to get the bytes of an empty string without copying...):Original, wrong solution was:
See Nuno Cruces's answer below for reasoning.
Testing it:
Output (try it on the Go Playground ):
BUT: You wrote you want this because you need performance. You also mentioned you want to compress the data. Please know that compressing data (using
gzip
) requires a lot more computation than just copying a few bytes! You will not see any noticeable performance gain by using this!Instead when you want to write
string
s to anio.Writer
, it's recommended to do it viaio.WriteString()
function which if possible will do so without making a copy of thestring
(by checking and callingWriteString()
method which if exists is most likely does it better than copying thestring
). For details, see What's the difference between ResponseWriter.Write and io.WriteString?There are also ways to access the contents of a
string
without converting it to[]byte
, such as indexing, or using a loop where the compiler optimizes away the copy:Also see related questions:
[]byte(string) vs []byte(*string)
What are the possible consequences of using unsafe conversion from []byte to string in go?
What is the difference between the string and []byte in Go?
Does conversion between alias types in Go create copies?
How does type conversion internally work? What is the memory utilization for the same?
js4nwp542#
After some extensive investigation, I believe I've discovered the most efficient way of getting a
[]byte
from astring
as of Go 1.17 (this is for i386/x86_64gc
; I haven't tested other architectures.) The trade-off of being efficient code here is being inefficient to code, though.Before I say anything else, it should be made clear that the differences are ultimately very small and probably inconsequential -- the info below is for fun/educational purposes only.
Summary
With some minor alterations, the accepted answer illustrating the technique of slicing a pointer to array is the most efficient way. That being said, I wouldn't be surprised if
unsafe.Slice
becomes the (decisively) better choice in the future.unsafe.Slice
unsafe.Slice
currently has the advantage of being slightly more readable, but I'm skeptical about it's performance. It looks like it makes a call toruntime.unsafeslice
. The following is the gc amd64 1.17 assembly of the function provided in Atamiri's answer (FUNCDATA
omitted). Note the stack check (lack ofNOSPLIT
):Other unimportant fun facts about the above (easily subject to change): compiled size of
3326
B; has an inline cost of7
; correct escape analysis:s leaks to ~r1 with derefs=0
.Carefully Modifying *reflect.SliceHeader
This method has the advantage/disadvantage of letting one modify the internal state of a slice directly. Unfortunately, due it's multiline nature and use of uintptr, the GC can easily mess things up if one is not careful about keeping a reference to the original string. (Here I avoided creating temporary pointers to reduce inline cost and to avoid needing to add
runtime.KeepAlive
):The corresponding assembly on amd64 (
FUNCDATA
omitted):Other unimportant fun facts about the above (easily subject to change): compiled size of
3700
B; has an inline cost of20
; subpar escape analysis:s leaks to {heap} with derefs=0
.Unsafer version of modifying SliceHeader
Adapted from Nuno Cruces' answer . This relies on the inherent structural similarity between
StringHeader
andSliceHeader
, so in a sense it breaks "more easily". Additionally, it temporarily creates an illegal state wherecap(b)
(being0
) is less thanlen(b)
.Corresponding assembly (
FUNCDATA
omitted):Other unimportant details: compiled size
3636
B, inline cost of11
, with subpar escape analysis:s leaks to {heap} with derefs=0
.Slicing a pointer to array
This is the accepted answer (shown here for comparison) -- its primary disadvantage is its ugliness (viz. magic number
0x7fff0000
). There's also the tiniest possibility of getting a string bigger than the array, and an unavoidable bounds check.Corresponding assembly (
FUNCDATA
removed).Other unimportant details: compiled size
3142
B, inline cost of9
, with correct escape analysis:s leaks to ~r1 with derefs=0
Note the
runtime.panicSlice3Alen
-- this is bounds check that checks thatlen(s)
is within0x7fff0000
.Improved slicing pointer to array
This is what I've concluded to be the most efficient method as of Go 1.17. I basically modified the accepted answer to eliminate the bounds check, and found a "more meaningful" constant (
math.MaxInt32
) to use than0x7fff0000
. UsingMaxInt32
preserves 32-bit compatibility.Corresponding assembly (
FUNCDATA
removed):Other unimportant details: compiled size
3188
B, inline cost of13
, and correct escape analysis:s leaks to ~r1 with derefs=0
xu3bshqb3#
在go 1.17中,我建议使用
unsafe.Slice
,因为它的可读性更强:我认为这也是可行的(不违反任何
unsafe.Pointer
规则),其优点是它适用于const
s
:公认的答案是错误的,可能会产生评论中提到的恐慌@RFC,@icza关于GC和keep alive的解释是误导的。
容量为零(甚至是一个任意值)的原因更为平淡无奇。
切片是:
字符串为:
将字节片转换为字符串可以像
strings.Builder
一样“安全”地完成:这将把
Data
指针和Len
从切片复制到字符串。相反的转换是不“安全”的,因为
Cap
没有设置为正确的值。unsafe.Pointer
规则#1。*以下是修复死机的正确代码:
又或者:
我应该补充的是,所有这些转换都是不安全的,因为字符串应该是不可变的,而字节数组/切片是可变的。
但是,如果您确信字节片不会发生变化,那么上述转换就不会出现边界(或GC)问题。
nxowjjhe4#
在Go语言1.17中,我们现在可以使用
unsafe.Slice
,因此可以将接受的答案改写为:l5tcr1uw5#
我设法通过这样来达到目的:
输出量:
这些变量都共享相同的内存。
6jjcrrmo6#
Go 1.20(2023年2月)
您可以使用
unsafe.StringData
来大大简化YenForYang's answer:StringData返回一个指向str的基础字节的指针。对于空字符串,返回值未指定,可能为nil。
由于Go语言的字符串是不可变的,因此StringData返回的字节不能被修改。
围棋提示游戏场:https://go.dev/play/p/FIXe0rb8YHE?v=gotip
记住,你不能给
b[n]
赋值,因为内存仍然是只读的。ncecgwcz7#
简单,没有反射,我认为它是可移植的。s是字符串,b是字节片
请记住,不应修改字节值(将死机)。可以重新切片(例如:(一个月一个月)