我给我的小宝贝做了个简单的速度测试:
import numpy as np
A = np.random.rand(1000, 1000)
B = np.random.rand(1000, 1000)
%timeit A.dot(B)
结果是:
30.3 ms ± 829 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
与其他人通常看到的结果相比(平均不到10毫秒),这个结果似乎异常缓慢。我想知道这种行为的原因可能是什么。
我的系统是MacOS Big Sur,基于M1芯片,Python版本是3.8.13,numpy版本是1.22.4,numpy是通过
pip install "numpy==1.22.4"
np.show_config()
的输出为:
openblas64__info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None)]
runtime_library_dirs = ['/usr/local/lib']
blas_ilp64_opt_info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None)]
runtime_library_dirs = ['/usr/local/lib']
openblas64__lapack_info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None), ('HAVE_LAPACKE', None)]
runtime_library_dirs = ['/usr/local/lib']
lapack_ilp64_opt_info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None), ('HAVE_LAPACKE', None)]
runtime_library_dirs = ['/usr/local/lib']
Supported SIMD extensions in this NumPy install:
baseline = SSE,SSE2,SSE3
found = SSSE3,SSE41,POPCNT,SSE42
not found = AVX,F16C,FMA3,AVX2,AVX512F,AVX512CD,AVX512_KNL,AVX512_SKX,AVX512_CLX,AVX512_CNL,AVX512_ICL
编辑:
我用这个代码片段做了另一个测试(来自1):
import time
import numpy as np
np.random.seed(42)
a = np.random.uniform(size=(300, 300))
runtimes = 10
timecosts = []
for _ in range(runtimes):
s_time = time.time()
for i in range(100):
a += 1
np.linalg.svd(a)
timecosts.append(time.time() - s_time)
print(f'mean of {runtimes} runs: {np.mean(timecosts):.5f}s')
我的测试结果是:
mean of 10 runs: 6.17438s
而网站1上的参考结果是:(芯片为M1 Max)
+-----------------------------------+-----------------------+--------------------+
| Python installed by (run on)→ | Miniforge (native M1) | Anaconda (Rosseta) |
+----------------------+------------+------------+----------+----------+---------+
| Numpy installed by ↓ | Run from → | Terminal | PyCharm | Terminal | PyCharm |
+----------------------+------------+------------+----------+----------+---------+
| Apple Tensorflow | 4.19151 | 4.86248 | / | / |
+-----------------------------------+------------+----------+----------+---------+
| conda install numpy | 4.29386 | 4.98370 | 4.10029 | 4.99271 |
+-----------------------------------+------------+----------+----------+---------+
从结果来看,与参考文献中的任何一个numpy版本相比,我的代码的计时要慢一些。
2条答案
按热度按时间6yjfywim1#
我在M1上也注意到了类似的速度下降,但我认为实际原因,至少在我的计算机上,不是根本错误的Numpy安装,而是基准测试本身的一些问题。
计算
x = a.T @ a; eigh(x)
需要2毫秒,而eigh(a.T @ a)
需要400毫秒。我认为在后一种情况下,%timeit
存在一些问题。可能由于某种原因,计算被路由到“效率核心”?我的初步回答是,您的第一个
%timeit
基准测试并不可靠。41zrol4v2#
如果您怀疑时间有问题,请尝试使用时间
更多关于苹果芯片上numpy的信息,请阅读下面链接中的第一个答案。为了获得最佳性能,建议使用苹果的加速vecLib。如果你使用conda安装,那么也可以查看@AndrejHribernik的评论:Why Python native on M1 Max is greatly slower than Python on old Intel i5?