Closed. This question needs debugging details . It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem . This will help others answer the question.
Closed 2 days ago.
This post was edited and submitted for review yesterday.
Improve this question
I'm trying to release my CUDA-program written in C++ with VS2019. When I try to release it, it throws an error MSB3721. The code I run is:
#include<cuda.h>
#include<cuda_runtime.h>
#include<cuda_runtime_api.h>
#include "device_launch_parameters.h"
#include<stdio.h>
#include<iostream>
#include "stuff.cuh"
using namespace std;
__global__ void some_kernel(double *vec_a, stuff* thing, int size)
{
int ii = blockIdx.x * blockDim.x + threadIdx.x;
if (ii < size) {
stuff current_thing(ii, 5);
current_thing.calculate_stuff(2.341, ii);
vec_a[ii] = current_thing.get_prop();
}
}
int main() {
double* vec_1, * d_vec_1;
int N = 20;
stuff* thing;
stuff* d_thing;
vec_1 = (double*)malloc(N * sizeof(double));
thing = (stuff*)malloc(sizeof(stuff));
*thing = stuff::stuff(3, N);
for (int ii = 0; ii < N; ii++) {
vec_1[ii] = 0;
}
cudaMalloc(&d_vec_1, N * sizeof(double));
cudaMalloc(&d_thing, sizeof(stuff));
cudaMemcpy(d_vec_1, vec_1, N * sizeof(double), cudaMemcpyHostToDevice);
cudaMemcpy(d_thing, thing, sizeof(stuff), cudaMemcpyHostToDevice);
some_kernel <<< N/256 + 1, 256 >>> (d_vec_1, d_thing, N);
cudaThreadSynchronize();
cudaMemcpy(vec_1, d_vec_1, N * sizeof(double), cudaMemcpyDeviceToHost);
for (int ii = 0; ii < N; ii++) {
std::cout << "vec[" << ii << "] = " << vec_1[ii] << std::endl;
}
}
The class used in the example is defined as follows:
//Header file:
#pragma once
#include<cuda.h>
#include<cuda_runtime.h>
#include<cuda_runtime_api.h>
#include<stdio.h>
#include <math.h>
#include<iostream>
class stuff
{
private:
int index;
double* vector_1;
double* vector_2;
double prop;
int size;
public:
__host__ __device__ stuff(int, int);
__host__ __device__ ~stuff();
__device__ void calculate_stuff(double, int);
__device__ double get_prop();
__device__ int get_index();
};
//Source code:
#include "stuff.cuh"
__host__ __device__ stuff::stuff(int ind, int size_in) {
index = ind;
prop = 1;
size = size_in;
vector_1 = (double*)malloc(size * sizeof(double));
vector_2 = (double*)malloc(size * sizeof(double));
for (int ii = 0; ii < size; ii++) {
vector_1[ii] = (4 * ii + ind) / (ind + ii + 1);
vector_2[ii] = (-2.14 * ii + ind) / (2*ind + ii + 1);
}
}
__host__ __device__ stuff::~stuff() {};
__device__ void stuff::calculate_stuff(double coeff, int N) {
prop = 1;
for (int ii = 0; ii < N; ii++) {
for (int jj = 0; jj < size; jj++) {
prop += pow(-1, ii) * vector_1[jj] * vector_2[jj]*coeff;
}
}
}
__device__ double stuff::get_prop() {
return prop;
}
__device__ int stuff::get_index() {
return index;
}
Basically the code generates members of the class stuff on the gpu, performs calculations and returns the results within the vector d_vec_1 to the host.
The detailed error description with high verbosity is given below.
Error MSB3721 Der Befehl ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0\bin\nvcc.exe" -gencode=arch=compute_80,code="sm_80,compute_80" -gencode=arch=compute_86,code="sm_86,compute_86" -gencode=arch=compute_87,code="sm_87,compute_87" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX86\x64" -x cu -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0\include" --keep-dir x64\Release -maxrregcount=0 --machine 64 --compile -cudart shared -DNDEBUG -D_CONSOLE -D_UNICODE -DUNICODE -Xcompiler "/EHsc /W3 /nologo /Od /Fdx64\Release\vc142.pdb /FS /RTC1 /MDd " -o x64\Release\Beam.cu.obj "D:\Projekt\Ray_tracing\RayTracer\RayTracing\Beam.cu"" wurde mit Code 255 beendet. RayTracing C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\MSBuild\Microsoft\VC\v160\BuildCustomizations\CUDA 12.0.targets 785
In debugging mode the code works fine. I'm using the Nvidia CUDA toolkit 12.0 and my GPU is a Nvidia RTX 3070. Therefore, I used compute_80,sm_80, compute_86,sm86 and compute_87,sm87 for the Ampere architecture as CUDA device property for code generation. In debugging mode the configuration is compute_52,sm52 but this also does not work for the release. The same error occured with CUDA 11.4 which I had installed before.
I checked the CUDA environment path which seems to be right. I'm not familiar with all the configurations and for now I just would like my code to work.
Thank you in advance for your help!
1条答案
按热度按时间yyhrrdl81#
您的项目似乎已针对CUDA 11.4进行了配置。
将 *.vcxproj文件中的所有“CUDA 11.4”条目替换为“CUDA 12.0”