如何修复Pytorch中的“RuntimeError:CUDA error:device-side assert triggered”

jchrr9hc  于 12个月前  发布在  其他
关注(0)|答案(4)|浏览(129)

我试图在我的自定义形状数据集上从这个存储库https://github.com/eriklindernoren/PyTorch-YOLOv3训练yolo-v3模型,但我一直得到错误“RuntimeError:CUDA error:device-side assert triggered”
我试着查找解决方案,并尝试了不同答案中建议的几件事(比如修复注解中类的索引),但错误仍然存在。
我按照repo的自述文件中的描述在自定义数据集上进行训练,并相应地调整了custom.data和data/custom/。
我一直收到这个输出。

C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [32,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [33,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [34,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [35,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [36,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [37,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [38,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [39,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [40,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [41,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [42,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [43,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [44,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [45,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [0,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [1,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [2,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [3,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [4,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [5,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [6,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [7,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [12,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [13,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [14,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [15,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [20,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [21,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [22,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [23,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [24,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [25,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [26,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [27,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [28,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [29,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [31,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
Traceback (most recent call last):
  File "train.py", line 105, in <module>
    loss, outputs = model(imgs, targets)
  File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\Documents\GP\Code\TorchYolo\PyTorch-YOLOv3\models.py", line 259, in forward
    x, layer_loss = module[0](x, targets, img_dim)
  File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\Documents\GP\Code\TorchYolo\PyTorch-YOLOv3\models.py", line 188, in forward
    ignore_thres=self.ignore_thres,
  File "D:\Documents\GP\Code\TorchYolo\PyTorch-YOLOv3\utils\utils.py", line 318, in build_targets
    iou_scores[b, best_n, gj, gi] = bbox_iou(pred_boxes[b, best_n, gj, gi], target_boxes, x1y1x2y2=False)
  File "D:\Documents\GP\Code\TorchYolo\PyTorch-YOLOv3\utils\utils.py", line 199, in bbox_iou
    b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2
RuntimeError: CUDA error: device-side assert triggered

字符串
唯一改变的是数组索引中的“2”,当处理train.jpg类标签索引时

b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2

rkue9o1l

rkue9o1l1#

我在google colab上训练resnet模型时也遇到了这个问题。所以在我的例子中,我在7个类上训练模型,但我的网络的最后一层设置为输出3个类。
所以我改变了

self.classifier = torch.nn.Sequential(torch.nn.BatchNorm1d(512), torch.nn.Linear(512, 3))

字符串

self.classifier = torch.nn.Sequential(torch.nn.BatchNorm1d(512), torch.nn.Linear(512, 7))


在这样做之后,我仍然得到错误,因为我没有重新启动google colab。

记住,每当你得到这个错误检查两件事:

1.class_labels应该从0开始,在我的例子中,[0,1,2,3,4,5,6]是7个类。
1.检查最终输出层是否输出准确的类数
在那之后

刷新notebook以刷新所有cudaAssert。

在任何CUDA错误之后,重新启动笔记本电脑,否则您将继续收到CUDA错误,因为之前的Assert尚未被刷新。通过重新启动笔记本电脑,您将刷新所有CUDAAssert。

jv2fixgn

jv2fixgn2#

通常,当你得到神秘的CUDA错误时,你应该切换到CPU,看看是否会得到更有意义的错误消息。
或者,设置CUDA_LAUNCH_BLOCKING=1以获得更多信息的堆栈跟踪(有关详细信息,请参阅this answer)。
请参阅this answer了解更多详细信息。
我猜,在你的例子中,看起来你除以2会创建分数,而pytorch会查找整数。

b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] // 2, box1[:, 0] + box1[:, 2] // 2

字符串

zpjtge22

zpjtge223#

1.在我的例子中,我让class_labels从0开始:在我的例子中,[0,1,2,3,4]是5个类。
1.然后我刷新了谷歌笔记本,它工作了。

3j86kqsm

3j86kqsm4#

将“set TORCH_USE_CUDA_DSA”添加到您的webui-user.bat文件可以修复此错误吗?我刚刚得到它,并将其设置在我的设置下面,如下所示:
set COMMANDLINE_ARGS= --no-download-sd-model --disable-nan-check --skip-version-check --no-half-disabled--upcast-sampling--opt-sdp-attention --xformers --force-enable-xformers --autolaunch set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:512 set TORCH_USE_CUDA_DSA

相关问题