python-3.x 使用CuPy的半精度

3pvhb19x  于 2023-06-25  发布在  Python
关注(0)|答案(1)|浏览(154)

我试图用cuda_fp16头文件提供的半精度格式编译一个简单的CUDA内核。
我的kernel看起来像这样:

code = r'''
extern "C" {

#include <cuda_fp16.h>

__global__ void kernel(half * const f1, half * const f2)
{
   if (blockDim.x*blockIdx.x + threadIdx.x < 12 && blockDim.y*blockIdx.y + threadIdx.y < 12)
   {
      const int ctr_0 = blockDim.x*blockIdx.x + threadIdx.x;
      const int ctr_1 = blockDim.y*blockIdx.y + threadIdx.y;
      f1[12*ctr_1 + ctr_0] = f2[12*ctr_1 + ctr_0];
   } 
}

}

我试着这样编译:

options = ('-I/path/to/cuda/include/', )

mod = cp.RawModule(code=code, options=options, backend="nvrtc", jitify=True)
func = mod.get_function("kernel")

但是,这会导致几个编译器错误:

---------------------------------------------------
--- JIT compile log for /tmp/tmpdiqnmomv/25f306a7612419fcd799b8c90718648c2c1313ca.cubin.cu ---
---------------------------------------------------
cuda_fp16.hpp(266): error: more than one instance of overloaded function "operator++" has "C" linkage

cuda_fp16.hpp(267): error: more than one instance of overloaded function "operator--" has "C" linkage

cuda_fp16.hpp(270): error: more than one instance of overloaded function "operator+" has "C" linkage

cuda_fp16.hpp(271): error: more than one instance of overloaded function "operator-" has "C" linkage

cuda_fp16.hpp(314): error: more than one instance of overloaded function "operator+" has "C" linkage

cuda_fp16.hpp(315): error: more than one instance of overloaded function "operator-" has "C" linkage

cuda_fp16.hpp(316): error: more than one instance of overloaded function "operator*" has "C" linkage

cuda_fp16.hpp(317): error: more than one instance of overloaded function "operator/" has "C" linkage

cuda_fp16.hpp(319): error: more than one instance of overloaded function "operator+=" has "C" linkage

cuda_fp16.hpp(320): error: more than one instance of overloaded function "operator-=" has "C" linkage

cuda_fp16.hpp(321): error: more than one instance of overloaded function "operator*=" has "C" linkage

cuda_fp16.hpp(322): error: more than one instance of overloaded function "operator/=" has "C" linkage

cuda_fp16.hpp(324): error: more than one instance of overloaded function "operator++" has "C" linkage

cuda_fp16.hpp(325): error: more than one instance of overloaded function "operator--" has "C" linkage

cuda_fp16.hpp(326): error: more than one instance of overloaded function "operator++" has "C" linkage

cuda_fp16.hpp(327): error: more than one instance of overloaded function "operator--" has "C" linkage

cuda_fp16.hpp(329): error: more than one instance of overloaded function "operator+" has "C" linkage

cuda_fp16.hpp(330): error: more than one instance of overloaded function "operator-" has "C" linkage

cuda_fp16.hpp(332): error: more than one instance of overloaded function "operator==" has "C" linkage

cuda_fp16.hpp(333): error: more than one instance of overloaded function "operator!=" has "C" linkage

cuda_fp16.hpp(334): error: more than one instance of overloaded function "operator>" has "C" linkage

cuda_fp16.hpp(335): error: more than one instance of overloaded function "operator<" has "C" linkage

cuda_fp16.hpp(336): error: more than one instance of overloaded function "operator>=" has "C" linkage

cuda_fp16.hpp(337): error: more than one instance of overloaded function "operator<=" has "C" linkage

24 errors detected in the compilation of "/tmp/tmpdiqnmomv/25f306a7612419fcd799b8c90718648c2c1313ca.cubin.cu".

有没有什么明显的我遗漏了?
我正在使用cupy-cuda11xcuda 11.2

xyhw6mcr

xyhw6mcr1#

错误消息非常清楚--编译器告诉您cuda_fp16.hpp包含C linkage不支持的特性。在这种情况下,函数重载会出现。
我希望这样的东西应该正确编译:

#include <cuda_fp16.h>

extern "C" 
__global__ void kernel(half * const f1, half * const f2)
{
   if (blockDim.x*blockIdx.x + threadIdx.x < 12 && blockDim.y*blockIdx.y + threadIdx.y < 12)
   {
      const int ctr_0 = blockDim.x*blockIdx.x + threadIdx.x;
      const int ctr_1 = blockDim.y*blockIdx.y + threadIdx.y;
      f1[12*ctr_1 + ctr_0] = f2[12*ctr_1 + ctr_0];
   } 
}

也就是说,你的函数有C链接,但是包含的头没有。

相关问题