请问您使用的Go版本是(go version
)?
$ go version 1.14.4
当我在amd64和arm64平台上运行AES-CBC性能分析时,我发现arm64架构下的函数:func xorBytes(dst, a, b []byte) int
和 func safeXORBytes(dst, a, b []byte, n int)
(在crypto/cipher/xor_generic.go中)总是出现在pprof列表的前15名。与amd64架构相比,这个函数在func xorBytesSSE2(dst, a, b *byte, n int)
中使用了SSE2 SIMD指令。
```bash
(pprof) top10
Showing nodes accounting for 700ms, 55.12% of 1270ms total
Showing top 10 nodes out of 113
flat flat% sum% cum cum%
170ms 13.39% 13.39% 530ms 41.73% runtime.mallocgc
90ms 7.09% 20.47% 90ms 7.09% crypto/cipher.safeXORBytes
90ms 7.09% 27.56% 130ms 10.24% syscall.Syscall
80ms 6.30% 33.86% 80ms 6.30% runtime.nextFreeFast (inline)
60ms 4.72% 38.58% 60ms 4.72% runtime.publicationBarrier
50ms 3.94% 42.52% 50ms 3.94% crypto/aes.expandKeyAsm
50ms 3.94% 46.46% 140ms 11.02% crypto/cipher.xorBytes
40ms 3.15% 49.61% 40ms 3.15% runtime.acquirem (inline)
40ms 3.15% 52.76% 40ms 3.15% runtime.memclrNoHeapPointers
30ms 2.36% 55.12% 30ms 2.36% crypto/internal/subtle.InexactOverlap
我在考虑是否可以使用arm64 SIMD指令来优化这个函数的性能?
2条答案
按热度按时间w8f9ii691#
https://golang.org/cl/142537提到了这个问题:
crypto/cipher: use Neon for xor on arm64
wsewodh22#
请查看我的PR #53154,其中添加了针对ARM的非NEON和NEON版本的xorBytes实现。这将填补与ARM64之间的差距。