c++ 使用CUDA 12.0工具包在Visual Studio 19中发布cuda程序的问题[已关闭]

j5fpnvbx  于 2023-02-10  发布在  其他
关注(0)|答案(1)|浏览(144)

Closed. This question needs debugging details . It is not currently accepting answers.

Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem . This will help others answer the question.
Closed 2 days ago.
This post was edited and submitted for review yesterday.
Improve this question
I'm trying to release my CUDA-program written in C++ with VS2019. When I try to release it, it throws an error MSB3721. The code I run is:

#include<cuda.h>
#include<cuda_runtime.h>
#include<cuda_runtime_api.h>
#include "device_launch_parameters.h"
#include<stdio.h>
#include<iostream>
#include "stuff.cuh"

using namespace std;

__global__ void some_kernel(double *vec_a, stuff* thing, int size) 
{

     int ii = blockIdx.x * blockDim.x + threadIdx.x;

    if (ii < size) {
         stuff current_thing(ii, 5);
        current_thing.calculate_stuff(2.341, ii);
        vec_a[ii] = current_thing.get_prop();
     }
}

int main() {
    double* vec_1, * d_vec_1;
    int N = 20;
    stuff* thing;
    stuff* d_thing;

    vec_1 = (double*)malloc(N * sizeof(double));
    thing = (stuff*)malloc(sizeof(stuff));
    *thing = stuff::stuff(3, N);

    for (int ii = 0; ii < N; ii++) {
        vec_1[ii] = 0;
    }

    cudaMalloc(&d_vec_1, N * sizeof(double));
    cudaMalloc(&d_thing, sizeof(stuff));

    cudaMemcpy(d_vec_1, vec_1, N * sizeof(double), cudaMemcpyHostToDevice);
    cudaMemcpy(d_thing, thing, sizeof(stuff), cudaMemcpyHostToDevice);

    some_kernel <<< N/256 + 1, 256 >>> (d_vec_1, d_thing,  N);
    cudaThreadSynchronize();

    cudaMemcpy(vec_1, d_vec_1, N * sizeof(double), cudaMemcpyDeviceToHost);

    for (int ii = 0; ii < N; ii++) {
        std::cout << "vec[" << ii << "] = " << vec_1[ii] << std::endl;
    }
}

The class used in the example is defined as follows:

//Header file:
#pragma once
#include<cuda.h>
#include<cuda_runtime.h>
#include<cuda_runtime_api.h>
#include<stdio.h>
#include <math.h>
#include<iostream>

class stuff
{
private:
    int index;
    double* vector_1;
    double* vector_2;
    double prop;
    int size;

public:
    __host__ __device__ stuff(int, int);
    __host__ __device__ ~stuff();

    __device__ void calculate_stuff(double, int);
    __device__ double get_prop();
    __device__ int get_index();
};

//Source code:
#include "stuff.cuh"

__host__ __device__ stuff::stuff(int ind, int size_in) {
    index = ind;
    prop = 1;
    size = size_in;

    vector_1 = (double*)malloc(size * sizeof(double));
    vector_2 = (double*)malloc(size * sizeof(double));

    for (int ii = 0; ii < size; ii++) {
        vector_1[ii] = (4 * ii + ind) / (ind + ii + 1);
        vector_2[ii] = (-2.14 * ii + ind) / (2*ind + ii + 1);
    }
}

__host__ __device__ stuff::~stuff() {};

__device__ void stuff::calculate_stuff(double coeff, int N) {
    prop = 1;

    for (int ii = 0; ii < N; ii++) {
        for (int jj = 0; jj < size; jj++) {
            prop += pow(-1, ii) * vector_1[jj] * vector_2[jj]*coeff;
        }
    }
}

__device__ double stuff::get_prop() {
    return prop;
}

__device__ int stuff::get_index() {
    return index;
}

Basically the code generates members of the class stuff on the gpu, performs calculations and returns the results within the vector d_vec_1 to the host.
The detailed error description with high verbosity is given below.
Error MSB3721 Der Befehl ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0\bin\nvcc.exe" -gencode=arch=compute_80,code="sm_80,compute_80" -gencode=arch=compute_86,code="sm_86,compute_86" -gencode=arch=compute_87,code="sm_87,compute_87" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX86\x64" -x cu -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0\include" --keep-dir x64\Release -maxrregcount=0 --machine 64 --compile -cudart shared -DNDEBUG -D_CONSOLE -D_UNICODE -DUNICODE -Xcompiler "/EHsc /W3 /nologo /Od /Fdx64\Release\vc142.pdb /FS /RTC1 /MDd " -o x64\Release\Beam.cu.obj "D:\Projekt\Ray_tracing\RayTracer\RayTracing\Beam.cu"" wurde mit Code 255 beendet. RayTracing C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\MSBuild\Microsoft\VC\v160\BuildCustomizations\CUDA 12.0.targets 785
In debugging mode the code works fine. I'm using the Nvidia CUDA toolkit 12.0 and my GPU is a Nvidia RTX 3070. Therefore, I used compute_80,sm_80, compute_86,sm86 and compute_87,sm87 for the Ampere architecture as CUDA device property for code generation. In debugging mode the configuration is compute_52,sm52 but this also does not work for the release. The same error occured with CUDA 11.4 which I had installed before.
I checked the CUDA environment path which seems to be right. I'm not familiar with all the configurations and for now I just would like my code to work.
Thank you in advance for your help!

yyhrrdl8

yyhrrdl81#

您的项目似乎已针对CUDA 11.4进行了配置。
将 *.vcxproj文件中的所有“CUDA 11.4”条目替换为“CUDA 12.0”

相关问题