debugging CUDA无法在Nsight调试中查看共享内存值

6yt4nkrj 于 2023-05-07 发布在其他

关注(0)|答案(1)|浏览(134)

我已经为一个问题挣扎了一段时间，似乎找不到解决办法。
问题是，当我尝试在Visual Studio 2008下使用Nvidia Nsight调试我的CUDA代码时，当使用共享内存时，我会得到奇怪的结果。
我的代码是：

template<typename T>
__device__
T integrate()
{
   extern __shared__ T s_test[]; // Dynamically allocated shared memory
   /**** Breakpoint (1) here ****/
   int index = threadIdx.x + threadIdx.y * blockDim.x; // Local index in block. Column major ordering
   if(index < 64 && blockIdx.x==0) { // Only work on a few values. Just testing
      s_test[index] = (T)index;
      /* Some other irelevant code here */
   }
   return v;
}

当我到达Breakpoint (1)并检查Visual Studio Watch窗口中的共享内存时，只有数组的前8个值发生了变化，其他值仍然为null。第64章都要这么做

我想这可能与所有的扭曲不能同时执行有关。所以我试着同步它们。我在integrate()中添加了此代码

template<typename T>
__device__
T integrate()
{
   /* Old code is still here */

   __syncthreads();
   /**** Breakpoint (2) here ****/
   if(index < 64 && blockIdx.x==0) {
      T tmp = s_test[index]; // Write to tmp variable so I can inspect it inside Nsight Watch window
      v = tmp + index; // Use `tmp` and `index` somehow so that the compiler doesn't optimize it out of existence
   }
return v;
}

但问题依然存在。此外，tmp中的其余值不是0，因为VS的Watch窗口指示。

我必须提到，要跨过__syncthreads()需要很多步骤，所以当我到达它时，我只是跳到Breakpoint (2)。怎么回事？

EDIT系统/启动配置信息
系统

名称Intel（R）Core（TM）2 Duo CPU E7300@2.66GHz
架构x86
频率2.666 MHz
核心数量2
页面大小4.096
总物理内存3.582，00 MB
可用物理内存1.983，00 MB
版本名称Windows 7 Ultimate
版本号6.1.7600
设备GeForce 9500 GT
驱动程序版本301.42
驱动器型号WDDM
CUDA设备索引% 0
GPU系列G96
计算能力1.1
SM数量4
帧缓冲区物理大小（MB）512
帧缓冲带宽（GB/s）16
帧缓冲器总线宽度（位）128
帧缓冲区位置专用
图形时钟（Mhz）812
内存时钟（Mhz）500
处理器时钟（Mhz）1625
RAM类型DDR2
IDE
Microsoft Visual Studio Team System 2008
NVIDIA Nsight Visual Studio Edition，版本2.2内部版本号2.2.0.12255
编译器命令

1> "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\\bin\nvcc.exe"  -G  -gencode=arch=compute_10,code=\"sm_10,compute_10\"   --machine 32 -ccbin "C:\Program Files\Microsoft Visual Studio 9.0\VC\bin" -D_NEXUS_DEBUG -g  -D_DEBUG -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd  " -I"inc" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\\include" -maxrregcount=0  --compile -o "Debug/process_f2f.cu.obj" process_f2f.cu

启动配置。共享内存的大小，似乎并不重要。我试过几个版本。我合作最多的是：

共享内存2048字节
网格/块大小：{101，101，1}，{16，16，1}

debugging

来源：https://stackoverflow.com/questions/12695533/cuda-unable-to-see-shared-memory-values-in-nsight-debugging

1条答案

按热度按时间

7rfyedvj1#

你有没有试过在赋值后放__syncthreads()？

template<typename T>
__device__
T integrate()
{
   extern __shared__ T s_test[]; // Dynamically allocated shared memory
   int index = threadIdx.x + threadIdx.y * blockDim.x; // Local index in block. Column major ordering
   if(index < 64 && blockIdx.x==0) { // Only work on a few values. Just testing
      s_test[index] = (T)index;
      /* Some other irelevant code here */
   }
   __syncthreads();
   /**** Breakpoint (1) here ****/
   return v;
}

并尝试查看此断点处的值。

赞(0）回复(0）举报 2023-05-07

我来回答

debugging CUDA无法在Nsight调试中查看共享内存值

1条答案

相关问题

热门标签

最新问答