c++ 如何查看仅限CPU的Libtorch矩阵乘法例程背后的详细信息?

fhity93d  于 2023-06-25  发布在  其他
关注(0)|答案(1)|浏览(146)

我已经从网站上下载了libtorch的CPU专用版本,并解压了它。
在使用libtorch的. cpp应用程序中,我写道(我在应用程序的其他部分使用intel-mkl,我希望libtorch也使用这个):

omp_set_num_threads(64);
    mkl_set_num_threads(64);

然后我检查:

std::cout << "torch::get_num_threads() returns: " << torch::get_num_threads() << std::endl;

    std::cout << "omp_get_max_threads() returns: " << omp_get_max_threads() << std::endl;
    std::cout << "mkl_get_max_threads() returns: " << mkl_get_max_threads() << std::endl;

这些都返回64.
(yes,我确实有这么多的核心,我在一台每个节点有128个核心的HPC机器上,我正在每个节点启动2个MPI进程)。
然后通过torch::matmul()函数调用执行std::complex<double>矩阵-矩阵乘法。
对我来说,这些乘法似乎很慢。
我如何检查:

  1. Libtorch在幕后使用MKL
  2. Libtorch使用线程进行MM乘法?我从上面的检查是否保证Libtorch在幕后使用超过1个线程?
    谢谢你!
  • 编辑 *:
$ nm -a --demangle libtorch_cpu.so | grep 'zgemv'
000000000b2c5290 T mkl_blas_avx2_xzgemv
000000000b2356e0 T mkl_blas_avx512_xzgemv
000000000b34a430 T mkl_blas_avx_xzgemv
000000000da8d320 T mkl_blas_avx_zgemv_c
000000000da8c310 T mkl_blas_avx_zgemv_n
000000000da8b840 T mkl_blas_avx_zgemv_t
000000000b75bae0 T mkl_blas_cnr_def_xzgemv
000000000e706570 T mkl_blas_cnr_def_zgemv_c
000000000e7072f0 T mkl_blas_cnr_def_zgemv_c_any
000000000e704cc0 T mkl_blas_cnr_def_zgemv_n
000000000e7058f0 T mkl_blas_cnr_def_zgemv_n_any
000000000e703180 T mkl_blas_cnr_def_zgemv_t
000000000e703f00 T mkl_blas_cnr_def_zgemv_t_any
000000000b6a3120 T mkl_blas_def_xzgemv
000000000e5e8370 T mkl_blas_def_zgemv_c
000000000e5e90f0 T mkl_blas_def_zgemv_c_any
000000000e5e6ac0 T mkl_blas_def_zgemv_n
000000000e5e76f0 T mkl_blas_def_zgemv_n_any
000000000e5e4f80 T mkl_blas_def_zgemv_t
000000000e5e5d00 T mkl_blas_def_zgemv_t_any
000000000b3f17d0 T mkl_blas_mc3_xzgemv
000000000de1d390 T mkl_blas_mc3_zgemv_c
000000000de1c380 T mkl_blas_mc3_zgemv_n
000000000de1b8b0 T mkl_blas_mc3_zgemv_t
000000000b4d9a50 T mkl_blas_mc_xzgemv
000000000e1ca5f0 T mkl_blas_mc_zgemv_c
000000000e1c7490 T mkl_blas_mc_zgemv_n
000000000e1c4660 T mkl_blas_mc_zgemv_t
0000000007fc0940 T mkl_blas_xzgemv
0000000007dfd930 T mkl_blas_zgemv
0000000007ed8920 T mkl_blas_zgemv_omp
0000000007ed8030 t mkl_blas_zgemv_omp._omp_fn.0

$ nm -a --demangle libtorch_cpu.so | grep 'zgemm'
00000000188df120 B .gomp_critical_user_mkl_blas_zgemm_omp_acopy_la_cs
0000000007ce5920 T cblas_zgemm_batch
0000000007cf6970 T mkl_blas__zgemm
0000000007cf76a0 T mkl_blas__zgemm_batch
000000000b2752b0 T mkl_blas_avx2_xzgemm
000000000d89f650 T mkl_blas_avx2_xzgemm_acopiedbcopy
000000000b2769f0 T mkl_blas_avx2_xzgemm_bdz
000000000b273320 T mkl_blas_avx2_xzgemm_internal_team
000000000b276a00 T mkl_blas_avx2_xzgemm_par
000000000b29c9b0 T mkl_blas_avx2_xzgemmger
000000000b272d70 T mkl_blas_avx2_xzgemmt
000000000b277610 T mkl_blas_avx2_zgemm_api_support
000000000b2769c0 T mkl_blas_avx2_zgemm_blk_info_bdz
000000000d899c70 T mkl_blas_avx2_zgemm_copyac
000000000b2920d0 T mkl_blas_avx2_zgemm_copyac_htn
000000000b276b50 T mkl_blas_avx2_zgemm_copyan
000000000b2920b0 T mkl_blas_avx2_zgemm_copyan_htn
000000000b276b10 T mkl_blas_avx2_zgemm_copyat
000000000b2920c0 T mkl_blas_avx2_zgemm_copyat_htn
000000000d899c30 T mkl_blas_avx2_zgemm_copybc
000000000b2920a0 T mkl_blas_avx2_zgemm_copybc_htn
000000000b276ad0 T mkl_blas_avx2_zgemm_copybn
000000000b292080 T mkl_blas_avx2_zgemm_copybn_htn
000000000b276a90 T mkl_blas_avx2_zgemm_copybt
000000000b292090 T mkl_blas_avx2_zgemm_copybt_htn
000000000b2770a0 T mkl_blas_avx2_zgemm_free_bufs
000000000b292170 T mkl_blas_avx2_zgemm_freebufs
000000000b276a70 T mkl_blas_avx2_zgemm_get_blks_size
000000000b276bd0 T mkl_blas_avx2_zgemm_get_bufs
000000000b276e70 T mkl_blas_avx2_zgemm_get_bufs_pack
000000000b276a80 T mkl_blas_avx2_zgemm_get_bufs_size
000000000b276a20 T mkl_blas_avx2_zgemm_get_kernel
000000000b276a40 T mkl_blas_avx2_zgemm_get_kernel_version
000000000b276a30 T mkl_blas_avx2_zgemm_get_optimal_kernel
000000000b276da0 T mkl_blas_avx2_zgemm_get_size_bufs
000000000b292160 T mkl_blas_avx2_zgemm_getbufs
000000000b2769d0 T mkl_blas_avx2_zgemm_getbufs_bdz
000000000b2770c0 T mkl_blas_avx2_zgemm_initialize_buffers
000000000b276150 T mkl_blas_avx2_zgemm_initialize_kernel_info
000000000b2760a0 T mkl_blas_avx2_zgemm_initialize_strategy
000000000d8b00c0 T mkl_blas_avx2_zgemm_ker0
000000000d8b0040 T mkl_blas_avx2_zgemm_ker0_cnr
000000000f697e00 T mkl_blas_avx2_zgemm_kernel_0
000000000f69a800 T mkl_blas_avx2_zgemm_kernel_0_b0
000000000f69bc00 T mkl_blas_avx2_zgemm_kernel_0_b0_cnr
000000000b2769e0 T mkl_blas_avx2_zgemm_kernel_0_bdz
000000000f699300 T mkl_blas_avx2_zgemm_kernel_0_cnr
000000000b2760e0 T mkl_blas_avx2_zgemm_map_thread_to_kernel
000000000b29f6c0 T mkl_blas_avx2_zgemm_mscale
000000000d8b0020 T mkl_blas_avx2_zgemm_mscale_wrapper
000000000b276a50 T mkl_blas_avx2_zgemm_num_kernels
000000000b295810 T mkl_blas_avx2_zgemm_pst
000000000b276a60 T mkl_blas_avx2_zgemm_set_blks_size
000000000b276f70 T mkl_blas_avx2_zgemm_set_bufs_pack
000000000b2bfb40 T mkl_blas_avx2_zgemm_sm_01
000000000b2bf8d0 T mkl_blas_avx2_zgemm_sm_01_10
000000000d9bc870 T mkl_blas_avx2_zgemm_sm_02
000000000d9b3250 T mkl_blas_avx2_zgemm_sm_03
000000000d9a7be0 T mkl_blas_avx2_zgemm_sm_04
000000000d99a650 T mkl_blas_avx2_zgemm_sm_05
000000000d98bac0 T mkl_blas_avx2_zgemm_sm_06
000000000d97a6d0 T mkl_blas_avx2_zgemm_sm_07
000000000d966d00 T mkl_blas_avx2_zgemm_sm_08
000000000d951770 T mkl_blas_avx2_zgemm_sm_09
000000000d939e70 T mkl_blas_avx2_zgemm_sm_10
000000000f697200 T mkl_blas_avx2_zgemm_zccopy_down2_ea
000000000f695900 T mkl_blas_avx2_zgemm_zccopy_right6_ea
000000000d67d600 T mkl_blas_avx2_zgemm_zcopy_down2_ea
000000000d67b200 T mkl_blas_avx2_zgemm_zcopy_down6_ea
000000000d67aa00 T mkl_blas_avx2_zgemm_zcopy_right2_ea
000000000d679100 T mkl_blas_avx2_zgemm_zcopy_right6_ea
000000000b276a10 T mkl_blas_avx2_zgemm_zero_desc
000000000d92fef0 T mkl_blas_avx2_zgemmt_nobufs
000000000b1d9fd0 T mkl_blas_avx512_xzgemm
000000000d478600 T mkl_blas_avx512_xzgemm_acopiedbcopy
000000000b1db700 T mkl_blas_avx512_xzgemm_bdz
000000000b1d8150 T mkl_blas_avx512_xzgemm_internal_team
000000000b1db710 T mkl_blas_avx512_xzgemm_par
000000000b20ce30 T mkl_blas_avx512_xzgemmger
000000000b1d7ba0 T mkl_blas_avx512_xzgemmt
000000000b1dc320 T mkl_blas_avx512_zgemm_api_support
000000000b1db6d0 T mkl_blas_avx512_zgemm_blk_info_bdz
000000000d473110 T mkl_blas_avx512_zgemm_copyac
000000000b1f42d0 T mkl_blas_avx512_zgemm_copyac_htn
000000000b1db860 T mkl_blas_avx512_zgemm_copyan
000000000b1f42b0 T mkl_blas_avx512_zgemm_copyan_htn
000000000b1db820 T mkl_blas_avx512_zgemm_copyat
000000000b1f42c0 T mkl_blas_avx512_zgemm_copyat_htn
000000000d4730d0 T mkl_blas_avx512_zgemm_copybc
000000000b1f42a0 T mkl_blas_avx512_zgemm_copybc_htn
000000000b1db7e0 T mkl_blas_avx512_zgemm_copybn
000000000b1f4280 T mkl_blas_avx512_zgemm_copybn_htn
000000000b1db7a0 T mkl_blas_avx512_zgemm_copybt
000000000b1f4290 T mkl_blas_avx512_zgemm_copybt_htn
000000000b1dbdb0 T mkl_blas_avx512_zgemm_free_bufs
000000000b1f4370 T mkl_blas_avx512_zgemm_freebufs
000000000b1db780 T mkl_blas_avx512_zgemm_get_blks_size
000000000b1db8e0 T mkl_blas_avx512_zgemm_get_bufs
000000000b1dbb80 T mkl_blas_avx512_zgemm_get_bufs_pack
000000000b1db790 T mkl_blas_avx512_zgemm_get_bufs_size
000000000b1db730 T mkl_blas_avx512_zgemm_get_kernel
000000000b1db750 T mkl_blas_avx512_zgemm_get_kernel_version
000000000b1db740 T mkl_blas_avx512_zgemm_get_optimal_kernel
000000000b1dbab0 T mkl_blas_avx512_zgemm_get_size_bufs
000000000b1f4360 T mkl_blas_avx512_zgemm_getbufs
000000000b1db6e0 T mkl_blas_avx512_zgemm_getbufs_bdz
000000000b1dbdd0 T mkl_blas_avx512_zgemm_initialize_buffers
000000000b1dae70 T mkl_blas_avx512_zgemm_initialize_kernel_info
000000000b1dadc0 T mkl_blas_avx512_zgemm_initialize_strategy
000000000d481b70 T mkl_blas_avx512_zgemm_ker0
000000000d481af0 T mkl_blas_avx512_zgemm_ker0_cnr
000000000f37be00 T mkl_blas_avx512_zgemm_kernel_0
000000000f374e00 T mkl_blas_avx512_zgemm_kernel_0_b0
000000000f36de00 T mkl_blas_avx512_zgemm_kernel_0_b0_cnr
000000000b1db6f0 T mkl_blas_avx512_zgemm_kernel_0_bdz
000000000f366c00 T mkl_blas_avx512_zgemm_kernel_0_cnr
000000000b1dae00 T mkl_blas_avx512_zgemm_map_thread_to_kernel
000000000b20f7f0 T mkl_blas_avx512_zgemm_mscale
000000000d481ad0 T mkl_blas_avx512_zgemm_mscale_wrapper
000000000b1db760 T mkl_blas_avx512_zgemm_num_kernels
000000000b205840 T mkl_blas_avx512_zgemm_pst
000000000b1db770 T mkl_blas_avx512_zgemm_set_blks_size
000000000b1dbc80 T mkl_blas_avx512_zgemm_set_bufs_pack
000000000b22fac0 T mkl_blas_avx512_zgemm_sm_01
000000000b22f850 T mkl_blas_avx512_zgemm_sm_01_10
000000000d5b6a90 T mkl_blas_avx512_zgemm_sm_02
000000000d5ad2d0 T mkl_blas_avx512_zgemm_sm_03
000000000d5a18d0 T mkl_blas_avx512_zgemm_sm_04
000000000d5940d0 T mkl_blas_avx512_zgemm_sm_05
000000000d583fb0 T mkl_blas_avx512_zgemm_sm_06
000000000d571570 T mkl_blas_avx512_zgemm_sm_07
000000000d55c150 T mkl_blas_avx512_zgemm_sm_08
000000000d5446a0 T mkl_blas_avx512_zgemm_sm_09
000000000d529d40 T mkl_blas_avx512_zgemm_sm_10
000000000f365900 T mkl_blas_avx512_zgemm_zccopy_down4_ea
000000000f363000 T mkl_blas_avx512_zgemm_zccopy_right12_ea
000000000cf87200 T mkl_blas_avx512_zgemm_zcopy_down12_ea
000000000cf85f00 T mkl_blas_avx512_zgemm_zcopy_down4_ea
000000000cf83600 T mkl_blas_avx512_zgemm_zcopy_right12_ea
000000000cf82800 T mkl_blas_avx512_zgemm_zcopy_right4_ea
000000000b1db720 T mkl_blas_avx512_zgemm_zero_desc
000000000d4d7cf0 T mkl_blas_avx512_zgemmt_nobufs
000000000b301bf0 T mkl_blas_avx_xzgemm
000000000b2fb630 T mkl_blas_avx_xzgemm_bdz
000000000b2fb5a0 T mkl_blas_avx_xzgemm_internal
000000000b2fb5b0 T mkl_blas_avx_xzgemm_internal_team
000000000b2ff170 T mkl_blas_avx_xzgemm_par
000000000b3159e0 T mkl_blas_avx_xzgemmger
000000000b2fda20 T mkl_blas_avx_xzgemmt
000000000b2fe690 T mkl_blas_avx_zgemm_api_support
000000000b2fb600 T mkl_blas_avx_zgemm_blk_info_bdz
000000000dac39e0 T mkl_blas_avx_zgemm_copyac
000000000b3118a0 T mkl_blas_avx_zgemm_copyac_htn
000000000dac35f0 T mkl_blas_avx_zgemm_copyan
000000000b311880 T mkl_blas_avx_zgemm_copyan_htn
000000000dac2a00 T mkl_blas_avx_zgemm_copyat
000000000b311890 T mkl_blas_avx_zgemm_copyat_htn
000000000dac2630 T mkl_blas_avx_zgemm_copybc
000000000b311870 T mkl_blas_avx_zgemm_copybc_htn
000000000dac11f0 T mkl_blas_avx_zgemm_copybn
000000000b311850 T mkl_blas_avx_zgemm_copybn_htn
000000000dac0e90 T mkl_blas_avx_zgemm_copybt
000000000b311860 T mkl_blas_avx_zgemm_copybt_htn
000000000dac0e80 T mkl_blas_avx_zgemm_free_bufs
000000000b311940 T mkl_blas_avx_zgemm_freebufs
000000000dac0d10 T mkl_blas_avx_zgemm_get_blks_size
000000000f7fb410 T mkl_blas_avx_zgemm_get_bufs
000000000b2fb2a0 T mkl_blas_avx_zgemm_get_bufs_size
000000000dac0d00 T mkl_blas_avx_zgemm_get_kernel_version
000000000b2fe580 T mkl_blas_avx_zgemm_get_optimal_kernel
000000000b311930 T mkl_blas_avx_zgemm_getbufs
000000000b2fb610 T mkl_blas_avx_zgemm_getbufs_bdz
000000000b2fb650 T mkl_blas_avx_zgemm_initialize_buffers
000000000b2fb640 T mkl_blas_avx_zgemm_initialize_kernel_info
000000000dac0b30 T mkl_blas_avx_zgemm_ker0
000000000f809c30 T mkl_blas_avx_zgemm_ker0_pst
000000000f7eb770 T mkl_blas_avx_zgemm_kernel_0
000000000f7ea100 T mkl_blas_avx_zgemm_kernel_0_b0
000000000b2fb620 T mkl_blas_avx_zgemm_kernel_0_bdz
000000000b2fb590 T mkl_blas_avx_zgemm_map_thread_to_kernel
000000000b319450 T mkl_blas_avx_zgemm_mscale
000000000b342750 T mkl_blas_avx_zgemm_pst
000000000b2fe570 T mkl_blas_avx_zgemm_set_blks_size
000000000b33b520 T mkl_blas_avx_zgemm_sm_01
000000000b33b2b0 T mkl_blas_avx_zgemm_sm_01_10
000000000dcbbd90 T mkl_blas_avx_zgemm_sm_02
000000000dcb08e0 T mkl_blas_avx_zgemm_sm_03
000000000dca3910 T mkl_blas_avx_zgemm_sm_04
000000000dc94cc0 T mkl_blas_avx_zgemm_sm_05
000000000dc83790 T mkl_blas_avx_zgemm_sm_06
000000000dc6f640 T mkl_blas_avx_zgemm_sm_07
000000000dc58a60 T mkl_blas_avx_zgemm_sm_08
000000000dc401c0 T mkl_blas_avx_zgemm_sm_09
000000000dc25100 T mkl_blas_avx_zgemm_sm_10
000000000b2fe4b0 T mkl_blas_avx_zgemm_zero_desc
000000000b32fc10 T mkl_blas_avx_zgemmt_nobufs
000000000b71fce0 T mkl_blas_cnr_def_xzgemm
000000000b71f700 T mkl_blas_cnr_def_xzgemm_bdz
000000000e738030 T mkl_blas_cnr_def_xzgemm_brc
000000000b71f5e0 T mkl_blas_cnr_def_xzgemm_internal
000000000b71f5f0 T mkl_blas_cnr_def_xzgemm_internal_team
000000000b72ee00 T mkl_blas_cnr_def_xzgemm_par
000000000b737b70 T mkl_blas_cnr_def_xzgemmger
000000000b71eb30 T mkl_blas_cnr_def_xzgemmt
000000000b71f6f0 T mkl_blas_cnr_def_zgemm_api_support
000000000b71f620 T mkl_blas_cnr_def_zgemm_blk_info_bdz
000000000b756c10 T mkl_blas_cnr_def_zgemm_copyac
000000000e737f30 T mkl_blas_cnr_def_zgemm_copyac_bdz
000000000f9ba270 T mkl_blas_cnr_def_zgemm_copyac_brc
000000000b7569d0 T mkl_blas_cnr_def_zgemm_copyan
000000000e737f10 T mkl_blas_cnr_def_zgemm_copyan_bdz
000000000f9b9e80 T mkl_blas_cnr_def_zgemm_copyan_brc
000000000b7566c0 T mkl_blas_cnr_def_zgemm_copyat
000000000e737ef0 T mkl_blas_cnr_def_zgemm_copyat_bdz
000000000f9b9a80 T mkl_blas_cnr_def_zgemm_copyat_brc
000000000b756550 T mkl_blas_cnr_def_zgemm_copybc
000000000e737eb0 T mkl_blas_cnr_def_zgemm_copybc_bdz
000000000f9b9700 T mkl_blas_cnr_def_zgemm_copybc_brc
000000000b7563b0 T mkl_blas_cnr_def_zgemm_copybn
000000000e737e70 T mkl_blas_cnr_def_zgemm_copybn_bdz
000000000f9b9320 T mkl_blas_cnr_def_zgemm_copybn_brc
000000000b7562a0 T mkl_blas_cnr_def_zgemm_copybt
000000000e737e30 T mkl_blas_cnr_def_zgemm_copybt_bdz
000000000f9b8fe0 T mkl_blas_cnr_def_zgemm_copybt_brc
000000000b72edf0 T mkl_blas_cnr_def_zgemm_free_bufs
000000000e738020 T mkl_blas_cnr_def_zgemm_freebufs_bdz
000000000b72edc0 T mkl_blas_cnr_def_zgemm_get_blks_size
000000000f9b8f30 T mkl_blas_cnr_def_zgemm_get_blks_size_brc
000000000b72ee30 T mkl_blas_cnr_def_zgemm_get_bufs
000000000f9ba5b0 T mkl_blas_cnr_def_zgemm_get_bufs_brc
000000000b72ee10 T mkl_blas_cnr_def_zgemm_get_bufs_size
000000000b72ee20 T mkl_blas_cnr_def_zgemm_get_kernel
000000000b72edb0 T mkl_blas_cnr_def_zgemm_get_optimal_kernel
000000000e737f50 T mkl_blas_cnr_def_zgemm_getbufs_bdz
000000000b71f610 T mkl_blas_cnr_def_zgemm_initialize_buffers
000000000b71f600 T mkl_blas_cnr_def_zgemm_initialize_kernel_info
000000000e6c3c90 T mkl_blas_cnr_def_zgemm_inner
000000000e6bedf0 T mkl_blas_cnr_def_zgemm_inner_b_roll
000000000e6ba110 T mkl_blas_cnr_def_zgemm_inner_roll
000000000e6b54f0 T mkl_blas_cnr_def_zgemm_inner_z_roll
000000000e6a0a00 T mkl_blas_cnr_def_zgemm_kernel_0_bdz
000000000f996800 T mkl_blas_cnr_def_zgemm_kernel_0_brc
000000000e69fc20 T mkl_blas_cnr_def_zgemm_kernel_0_zen
000000000b71f5d0 T mkl_blas_cnr_def_zgemm_map_thread_to_kernel
000000000b755d10 T mkl_blas_cnr_def_zgemm_mscale
000000000b72ede0 T mkl_blas_cnr_def_zgemm_num_kernels
000000000b7502a0 T mkl_blas_cnr_def_zgemm_pst
000000000b74fe90 T mkl_blas_cnr_def_zgemm_scalm
000000000b72edd0 T mkl_blas_cnr_def_zgemm_set_blks_size
000000000f995c00 T mkl_blas_cnr_def_zgemm_zccopy_down2_bdz
000000000f994d00 T mkl_blas_cnr_def_zgemm_zccopy_right4_bdz
000000000f994100 T mkl_blas_cnr_def_zgemm_zcopy_down2_bdz
000000000f992b00 T mkl_blas_cnr_def_zgemm_zcopy_down4_bdz
000000000f992300 T mkl_blas_cnr_def_zgemm_zcopy_right2_bdz
000000000f991400 T mkl_blas_cnr_def_zgemm_zcopy_right4_bdz
000000000b72eda0 T mkl_blas_cnr_def_zgemm_zero_desc
000000000b74fd60 T mkl_blas_cnr_def_zgemm_zerom
000000000b74a420 T mkl_blas_cnr_def_zgemmt_nobufs
000000000b6634f0 T mkl_blas_def_xzgemm
000000000b662f10 T mkl_blas_def_xzgemm_bdz
000000000e61a000 T mkl_blas_def_xzgemm_brc
000000000b6620d0 T mkl_blas_def_xzgemm_internal
000000000b6620e0 T mkl_blas_def_xzgemm_internal_team
000000000b672610 T mkl_blas_def_xzgemm_par
000000000b67b290 T mkl_blas_def_xzgemmger
000000000b662390 T mkl_blas_def_xzgemmt
000000000b662f00 T mkl_blas_def_zgemm_api_support
000000000b662e30 T mkl_blas_def_zgemm_blk_info_bdz
000000000b69e010 T mkl_blas_def_zgemm_copyac
000000000e619f00 T mkl_blas_def_zgemm_copyac_bdz
000000000f96da70 T mkl_blas_def_zgemm_copyac_brc
000000000b69ddd0 T mkl_blas_def_zgemm_copyan
000000000e619ee0 T mkl_blas_def_zgemm_copyan_bdz
000000000f96d680 T mkl_blas_def_zgemm_copyan_brc
000000000b69d9d0 T mkl_blas_def_zgemm_copyat
000000000e619ec0 T mkl_blas_def_zgemm_copyat_bdz
000000000f96d280 T mkl_blas_def_zgemm_copyat_brc
000000000b69d860 T mkl_blas_def_zgemm_copybc
000000000e619e80 T mkl_blas_def_zgemm_copybc_bdz
000000000f96cf00 T mkl_blas_def_zgemm_copybc_brc
000000000b69d6c0 T mkl_blas_def_zgemm_copybn
000000000e619e40 T mkl_blas_def_zgemm_copybn_bdz
000000000f96cb20 T mkl_blas_def_zgemm_copybn_brc
000000000b69d5b0 T mkl_blas_def_zgemm_copybt
000000000e619e00 T mkl_blas_def_zgemm_copybt_bdz
000000000f96c7e0 T mkl_blas_def_zgemm_copybt_brc
000000000b672600 T mkl_blas_def_zgemm_free_bufs
000000000e619ff0 T mkl_blas_def_zgemm_freebufs_bdz
000000000b6725d0 T mkl_blas_def_zgemm_get_blks_size
000000000f96c730 T mkl_blas_def_zgemm_get_blks_size_brc
000000000b672640 T mkl_blas_def_zgemm_get_bufs
000000000f96ddb0 T mkl_blas_def_zgemm_get_bufs_brc
000000000b672620 T mkl_blas_def_zgemm_get_bufs_size
000000000b672630 T mkl_blas_def_zgemm_get_kernel
000000000b6725c0 T mkl_blas_def_zgemm_get_optimal_kernel
000000000e619f20 T mkl_blas_def_zgemm_getbufs_bdz
000000000b662100 T mkl_blas_def_zgemm_initialize_buffers
000000000b6620f0 T mkl_blas_def_zgemm_initialize_kernel_info
000000000e5a5a90 T mkl_blas_def_zgemm_inner
000000000e5a0bf0 T mkl_blas_def_zgemm_inner_b_roll
000000000e59bf10 T mkl_blas_def_zgemm_inner_roll
000000000e5972f0 T mkl_blas_def_zgemm_inner_z_roll
000000000e5828e0 T mkl_blas_def_zgemm_kernel_0_bdz
000000000f94a000 T mkl_blas_def_zgemm_kernel_0_brc
000000000e581b00 T mkl_blas_def_zgemm_kernel_0_zen
000000000b6620c0 T mkl_blas_def_zgemm_map_thread_to_kernel
000000000b67e0c0 T mkl_blas_def_zgemm_mscale
000000000b6725f0 T mkl_blas_def_zgemm_num_kernels
000000000b697b40 T mkl_blas_def_zgemm_pst
000000000b697730 T mkl_blas_def_zgemm_scalm
000000000b6725e0 T mkl_blas_def_zgemm_set_blks_size
000000000f949400 T mkl_blas_def_zgemm_zccopy_down2_bdz
000000000f948500 T mkl_blas_def_zgemm_zccopy_right4_bdz
000000000f947900 T mkl_blas_def_zgemm_zcopy_down2_bdz
000000000f946300 T mkl_blas_def_zgemm_zcopy_down4_bdz
000000000f945b00 T mkl_blas_def_zgemm_zcopy_right2_bdz
000000000f944c00 T mkl_blas_def_zgemm_zcopy_right4_bdz
000000000b6725b0 T mkl_blas_def_zgemm_zero_desc
000000000b697600 T mkl_blas_def_zgemm_zerom
000000000b691cc0 T mkl_blas_def_zgemmt_nobufs
0000000007d0b540 T mkl_blas_errchk_zgemm
0000000007d0b750 T mkl_blas_errchk_zgemm_batch
0000000007d08b60 T mkl_blas_errchk_zgemm_batch_ilp64
0000000007d08940 T mkl_blas_errchk_zgemm_ilp64
000000000b3be250 T mkl_blas_mc3_xzgemm
000000000b3c8600 T mkl_blas_mc3_xzgemm_abcopied_htn
000000000b3c85e0 T mkl_blas_mc3_xzgemm_acopied_htn
000000000b3c85f0 T mkl_blas_mc3_xzgemm_bcopied_htn
000000000b3b9750 T mkl_blas_mc3_xzgemm_bdz
000000000b3b96c0 T mkl_blas_mc3_xzgemm_internal
000000000b3b96d0 T mkl_blas_mc3_xzgemm_internal_team
000000000b3bb7d0 T mkl_blas_mc3_xzgemm_par
000000000b3cc290 T mkl_blas_mc3_xzgemmger
000000000b3ba3a0 T mkl_blas_mc3_xzgemmt
000000000b3bb000 T mkl_blas_mc3_zgemm_api_support
000000000b3b9720 T mkl_blas_mc3_zgemm_blk_info_bdz
000000000b3c85d0 T mkl_blas_mc3_zgemm_blk_info_htn
000000000dfd13b0 T mkl_blas_mc3_zgemm_copyac
000000000b3c85b0 T mkl_blas_mc3_zgemm_copyac_htn
000000000dfd0ed0 T mkl_blas_mc3_zgemm_copyan
000000000b3c8570 T mkl_blas_mc3_zgemm_copyan_htn
000000000dfd0cf0 T mkl_blas_mc3_zgemm_copyat
000000000b3c8580 T mkl_blas_mc3_zgemm_copyat_htn
000000000dfd0860 T mkl_blas_mc3_zgemm_copybc
000000000b3c85c0 T mkl_blas_mc3_zgemm_copybc_htn
000000000dfd0390 T mkl_blas_mc3_zgemm_copybn
000000000b3c8590 T mkl_blas_mc3_zgemm_copybn_htn
000000000dfcff60 T mkl_blas_mc3_zgemm_copybt
000000000b3c85a0 T mkl_blas_mc3_zgemm_copybt_htn
000000000de44fc0 T mkl_blas_mc3_zgemm_free_bufs
000000000f873d10 T mkl_blas_mc3_zgemm_get_blks_size
000000000f873940 T mkl_blas_mc3_zgemm_get_bufs
000000000b3b93c0 T mkl_blas_mc3_zgemm_get_bufs_size
000000000de44fb0 T mkl_blas_mc3_zgemm_get_kernel_version
000000000b3baef0 T mkl_blas_mc3_zgemm_get_optimal_kernel
000000000b3b9730 T mkl_blas_mc3_zgemm_getbufs_bdz
000000000b3b9770 T mkl_blas_mc3_zgemm_initialize_buffers
000000000b3b9760 T mkl_blas_mc3_zgemm_initialize_kernel_info
000000000f8737a0 T mkl_blas_mc3_zgemm_ker0
000000000f87e780 T mkl_blas_mc3_zgemm_ker0_pst
000000000fe26670 T mkl_blas_mc3_zgemm_kernel_0_0
000000000fe25130 T mkl_blas_mc3_zgemm_kernel_0_1
000000000b3b9740 T mkl_blas_mc3_zgemm_kernel_0_bdz
000000000b3b96b0 T mkl_blas_mc3_zgemm_map_thread_to_kernel
000000000b3cef20 T mkl_blas_mc3_zgemm_mscale
000000000b3eb880 T mkl_blas_mc3_zgemm_pst
000000000b3b93b0 T mkl_blas_mc3_zgemm_set_blks_size
000000000b3e8160 T mkl_blas_mc3_zgemm_sm_01
000000000b3e7ef0 T mkl_blas_mc3_zgemm_sm_01_10
000000000dfcc600 T mkl_blas_mc3_zgemm_sm_02
000000000dfc7bd0 T mkl_blas_mc3_zgemm_sm_03
000000000dfc1f40 T mkl_blas_mc3_zgemm_sm_04
000000000dfbae70 T mkl_blas_mc3_zgemm_sm_05
000000000dfb2d50 T mkl_blas_mc3_zgemm_sm_06
000000000dfa9780 T mkl_blas_mc3_zgemm_sm_07
000000000df9ed70 T mkl_blas_mc3_zgemm_sm_08
000000000df92e80 T mkl_blas_mc3_zgemm_sm_09
000000000df859b0 T mkl_blas_mc3_zgemm_sm_10
000000000b3bae30 T mkl_blas_mc3_zgemm_zero_desc
000000000b3df7d0 T mkl_blas_mc3_zgemmt_nobufs
000000000b4552b0 T mkl_blas_mc_xzgemm
000000000b44f620 T mkl_blas_mc_xzgemm_bdz
000000000b44f590 T mkl_blas_mc_xzgemm_internal
000000000b44f5a0 T mkl_blas_mc_xzgemm_internal_team
000000000b4527d0 T mkl_blas_mc_xzgemm_par
000000000b46e390 T mkl_blas_mc_xzgemmger
000000000b44f970 T mkl_blas_mc_xzgemmt
000000000b4508e0 T mkl_blas_mc_zgemm_api_support
000000000b44f5f0 T mkl_blas_mc_zgemm_blk_info_bdz
000000000e1ebba0 T mkl_blas_mc_zgemm_copya_ext
000000000e3a51e0 T mkl_blas_mc_zgemm_copyac
000000000e3a4da0 T mkl_blas_mc_zgemm_copyac_htn
000000000e3a4740 T mkl_blas_mc_zgemm_copyan
000000000e3a40e0 T mkl_blas_mc_zgemm_copyan_htn
000000000e3a3cf0 T mkl_blas_mc_zgemm_copyat
000000000e3a3900 T mkl_blas_mc_zgemm_copyat_htn
000000000e1ebb80 T mkl_blas_mc_zgemm_copyb_ext
000000000e3a33f0 T mkl_blas_mc_zgemm_copybc
000000000e3a3050 T mkl_blas_mc_zgemm_copybc_htn
000000000e3a2bc0 T mkl_blas_mc_zgemm_copybn
000000000e3a2710 T mkl_blas_mc_zgemm_copybn_htn
000000000e3a2250 T mkl_blas_mc_zgemm_copybt
000000000e3a1dc0 T mkl_blas_mc_zgemm_copybt_htn
000000000e1ebb70 T mkl_blas_mc_zgemm_free_bufs
000000000e1eba40 T mkl_blas_mc_zgemm_get_blks_size
000000000e1eb940 T mkl_blas_mc_zgemm_get_blks_size_htn
000000000e1eb570 T mkl_blas_mc_zgemm_get_bufs
000000000b4505f0 T mkl_blas_mc_zgemm_get_bufs_size
000000000e1eb460 T mkl_blas_mc_zgemm_get_kernel
000000000e1eb450 T mkl_blas_mc_zgemm_get_kernel_version
000000000b4504d0 T mkl_blas_mc_zgemm_get_optimal_kernel
000000000b44f600 T mkl_blas_mc_zgemm_getbufs_bdz
000000000f89c050 T mkl_blas_mc_zgemm_htn_ker0_0_0
000000000f89b890 T mkl_blas_mc_zgemm_htn_ker0_0_1
000000000f903c30 T mkl_blas_mc_zgemm_htn_ker0_pst
000000000b44f640 T mkl_blas_mc_zgemm_initialize_buffers
000000000b44f630 T mkl_blas_mc_zgemm_initialize_kernel_info
000000000e1eb0d0 T mkl_blas_mc_zgemm_ker0
000000000f88ff40 T mkl_blas_mc_zgemm_ker0_full
000000000f8898f0 T mkl_blas_mc_zgemm_ker0_general
000000000e1eb290 T mkl_blas_mc_zgemm_ker0_htn
000000000f902c90 T mkl_blas_mc_zgemm_ker0_pst
000000000b44f610 T mkl_blas_mc_zgemm_kernel_0_bdz
000000000b44f580 T mkl_blas_mc_zgemm_map_thread_to_kernel
000000000b471610 T mkl_blas_mc_zgemm_mscale
000000000b4d3510 T mkl_blas_mc_zgemm_pst
000000000b4504c0 T mkl_blas_mc_zgemm_set_blks_size
000000000b4cf670 T mkl_blas_mc_zgemm_sm_01
000000000b4cf400 T mkl_blas_mc_zgemm_sm_01_10
000000000e39dc50 T mkl_blas_mc_zgemm_sm_02
000000000e398820 T mkl_blas_mc_zgemm_sm_03
000000000e391ef0 T mkl_blas_mc_zgemm_sm_04
000000000e389fd0 T mkl_blas_mc_zgemm_sm_05
000000000e380f60 T mkl_blas_mc_zgemm_sm_06
000000000e376720 T mkl_blas_mc_zgemm_sm_07
000000000e36aa30 T mkl_blas_mc_zgemm_sm_08
000000000e35d530 T mkl_blas_mc_zgemm_sm_09
000000000e34e8c0 T mkl_blas_mc_zgemm_sm_10
000000000b450400 T mkl_blas_mc_zgemm_zero_desc
000000000b4c6010 T mkl_blas_mc_zgemmt_nobufs
0000000007fc1510 T mkl_blas_xzgemm
0000000007fc1730 T mkl_blas_xzgemm_bdz
0000000007fc0fd0 T mkl_blas_xzgemm_internal_team
0000000007fc0d80 T mkl_blas_xzgemm_par
0000000007fc1300 T mkl_blas_xzgemmger
0000000007fc0b40 T mkl_blas_xzgemmt
0000000007dad2f0 T mkl_blas_zgemm
0000000007ee7c20 T mkl_blas_zgemm_1D_col
0000000007ee78f0 T mkl_blas_zgemm_1D_row
0000000007eeaee0 T mkl_blas_zgemm_1D_with_copy_0
0000000007ee8580 T mkl_blas_zgemm_2D_abcopy_abx_m_km_par_p
0000000007ee7f50 T mkl_blas_zgemm_2D_bcopy
0000000007ee6f60 T mkl_blas_zgemm_2D_bsrc
0000000007ee7280 T mkl_blas_zgemm_2D_improved_bsrc
0000000007eeb830 T mkl_blas_zgemm_2D_xgemm_p
0000000007fc00a0 T mkl_blas_zgemm_api_support
0000000007db4110 T mkl_blas_zgemm_batch
0000000007fbfec0 T mkl_blas_zgemm_blk_info_bdz
0000000007fbfd10 T mkl_blas_zgemm_get_bufs_size
0000000007fbfbf0 T mkl_blas_zgemm_get_optimal_kernel
0000000007fbfa70 T mkl_blas_zgemm_initialize_buffers
0000000007fbf8c0 T mkl_blas_zgemm_initialize_kernel_info
0000000007fbf780 T mkl_blas_zgemm_map_thread_to_kernel
0000000007fbf5f0 T mkl_blas_zgemm_mscale
0000000007eea070 T mkl_blas_zgemm_omp_driver_v1
0000000007ee9310 t mkl_blas_zgemm_omp_driver_v1._omp_fn.0
0000000007ee9470 t mkl_blas_zgemm_omp_driver_v1._omp_fn.1
0000000007ee9c40 t mkl_blas_zgemm_omp_driver_v1._omp_fn.2
0000000007ee9c30 T mkl_blas_zgemm_omp_free_prototype_memory
0000000007ee95d0 T mkl_blas_zgemm_omp_get_prototype
0000000007fbf470 T mkl_blas_zgemm_set_blks_size
0000000007eeb5e0 T mkl_blas_zgemm_xgemm_external_omp
0000000007fbf350 T mkl_blas_zgemm_zero_desc
0000000007dfdcd0 T mkl_blas_zgemmger
0000000007ed8c80 T mkl_blas_zgemmger_omp
0000000007ed8b40 t mkl_blas_zgemmger_omp._omp_fn.0
0000000007da3d70 T mkl_blas_zgemmt
0000000007ee66c0 T mkl_blas_zgemmt_omp_driver_v1
00000000080e0060 t mkl_lapack_zgemm_team
0000000007cf6970 T zgemm
0000000007cf6970 T zgemm_
0000000007cf7030 T zgemm_64
0000000007cf7030 T zgemm_64_
0000000007cf76a0 T zgemm_batch
0000000007cf76a0 T zgemm_batch_
0000000007cf7c00 T zgemm_batch_64
0000000007cf7c00 T zgemm_batch_64_
de90aj5v

de90aj5v1#

一般来说,在不同的复杂程度上,有几种不同的方法可以做到这一点。您可以:
1.检查libtorch是否使用ldd libtorch或类似命令行链接到intel-mkl
1.使用类似nm -a --demangle <some path to libtorch.*.so> | grep 'mkl'的命令,检查libtorch中是否引用了您感兴趣的特定intel-mkl例程的符号。
1.使用VTune、opprofile或perf等工具,在运行时分析对torch::matmul()的特定调用。
(1)这是一个非常基本的健全性检查,它不会告诉你更多,你正在使用的特定libtorch库至少会尝试从intel-mkl中提取一些符号。它可以帮助调试您是否正在使用您预期的libtorch版本,以及该版本的库是否能够遍历系统的库路径以找到它应该是的intel-mkl版本。这很少是一个问题,但检查起来很容易,也很快速,而且您不想花费比您必须调试的时间更长的时间,例如,MKL的错误安装。(2)将告诉你更多关于libtorch的副本静态引用哪些MKL例程-尽管值得注意的是 * 不 * 在运行时调用这些例程需要哪些代码路径。
(3)几乎肯定是你想要的,并且会更深入地回答为什么这些电话似乎很慢。我猜,如果您使用MKL,VTune可能也可用(对于AMD,考虑类似的工具μProf)。如果您使用的是采用Intel CPU的系统,VTune还将详细了解这些微体系结构,前提是您使用的VTune版本是在您运行的处理器之后发布的(同样,很可能)。VTune callstack report可以告诉你代码中的哪些路径向下调用了libtorch,如果这些路径中的任何一个最终在特定的MKL例程中结束,* 然后 * 这些路径中的哪些占用了很多时间 * 以及为什么 *。VTune thread analysis还可以告诉你,你的OMP/MKL/torch线程是否有平衡良好的数据集要处理,你没有考虑的其他线程是否占用了时间或让他们不定期,等等。类似的分析也可用于其他工具,但可能只是将其组合起来并可视化更耗时。
还有一些更棘手、更耗时的事情,您可以将自己的分析和登录挂接到汇编代码中,比如MKL中的代码。根据到目前为止所述,这不太可能是您所需要的,但是如果现有工具不能满足您的需要,这是最后的选择。如果你坚信你需要这样的技术,它将是值得你的努力,A study of Binary Instrumentation techniques由Soumyakant Priyadarshan,英特尔的PIN dynamic binary instrumentation或类似的可能是很好的资源。

相关问题