pytorch 为什么我的ONNXRuntime推理在GPU上崩溃，没有任何日志？

zlwx9yxi 于 2022-11-23 发布在其他

关注(0)|答案(1)|浏览(524)

我正在尝试运行一个用Python中的pytorch创建的C#ONNX模型进行图像分割。当我在CPU上运行时一切正常，但当我尝试使用GPU时，我的应用程序在尝试运行推理时崩溃。（在Python中使用GPU进行推理时一切正常）
我唯一拥有的是Windows 10事件查看器中的一个事件：
错误应用程序名称：深度学习ONNX.exe，版本：1.0.0.0，时间戳：0x 6331 eb 0 e故障模块名称：cudnn64_8.dll，版本：6.14.11.6050，时间戳：0x 62 e9 c226异常代码：0xc 0000409故障偏移：0x 000000000001420 d故障进程ID：0x 2cc 0故障应用程序启动时间：0x 01 d8 f830 aac 6 f0 a2错误应用程序路径：C：\R&D\DeepLearningONNX\DeepLearningONNX\bin\x64\Debug\net6.0-windows\DeepLearningONNX.exe错误模块路径：C：\程序文件\NVIDIA GPU计算工具包\CUDA\v11.6\bin\cudnn64_8.dll报告ID：40803 e1 a-e84 d-4645-bfb 6 - 4 ebbb 6 ba 1b 78错误程序包全名：错误的程序包相关应用程序ID：

我的硬件：

NVIDIA Quadro P620（4GB）。驱动程序31.0.15.1740
英特尔酷睿i7- 10850 H
Windows 10下半年操作系统内部版本号19045.2251

在我的环境系统变量中：

CUDA_PATH：*C：\程序文件\NVIDIA GPU计算工具包\CUDA\v11.6 *
CUDA路径V11_6：*C：\程序文件\NVIDIA GPU计算工具包\CUDA\v11.6 *

C：\程序文件\NVIDIA\CUDNN\v8.5 *;*C：\程序文件\NVIDIA GPU计算工具包\CUDA\v11.6\bin *;*C：\程序文件\NVIDIA GPU计算工具包\CUDA\v11.6\libnvvp *
在我的C#（.NET 6）解决方案中。安装的nuget：

微软的运行时软件版本1.13.1

安装的软件：

Visual Studio社区2022（64位）版本17.3.6
cuda_11.6.2_511.65_windows.exe

压缩档已解压缩至 *C：\程式档\NVIDIA\CUDNN\v8.5 *

我的程式码C#：

private void InferenceDebug(string modelPath, bool useGPU)
        {
            InferenceSession session;

            if (useGPU)
            {
                var cudaProviderOptions = new OrtCUDAProviderOptions();
                var providerOptionsDict = new Dictionary<string, string>();
                providerOptionsDict["device_id"] = "0";
                providerOptionsDict["gpu_mem_limit"] = "2147483648";
                providerOptionsDict["arena_extend_strategy"] = "kSameAsRequested";
                providerOptionsDict["cudnn_conv_algo_search"] = "DEFAULT";
                providerOptionsDict["do_copy_in_default_stream"] = "1";
                providerOptionsDict["cudnn_conv_use_max_workspace"] = "1";
                providerOptionsDict["cudnn_conv1d_pad_to_nc1d"] = "1";

                cudaProviderOptions.UpdateOptions(providerOptionsDict);

                SessionOptions options = SessionOptions.MakeSessionOptionWithCudaProvider(cudaProviderOptions);
                session = new InferenceSession(modelPath, options);
            }
            else
                session = new InferenceSession(modelPath);

            int w = 128;
            int h = 128;
            Tensor<float> input = new DenseTensor<float>(new int[] { 1, 3, h, w });
            Random random = new Random(42);

            for (int y = 0; y < h; y++)
            {
                for (int x = 0; x < w; x++)
                {
                    input[0, 0, y, x] = (float)(random.NextDouble() / 255);
                    input[0, 1, y, x] = (float)(random.NextDouble() / 255);
                    input[0, 2, y, x] = (float)(random.NextDouble() / 255);
                }
            }

            var inputs = new List<NamedOnnxValue> { NamedOnnxValue.CreateFromTensor<float>("modelInput", input) };
            using IDisposableReadOnlyCollection<DisposableNamedOnnxValue> results = session.Run(inputs); // The crash is when executing this line
        }

我的代码Python（3.10 64位）：

import torch # version '1.12.1+cu116'
from torch import nn
import segmentation_models_pytorch as smp
from segmentation_models_pytorch.losses import DiceLoss

class SegmentationModel(nn.Module):
  def __init__(self):
    super(SegmentationModel, self).__init__()

    self.arc = smp.UnetPlusPlus(encoder_name= 'timm-efficientnet-b0',
                        encoder_weights='imagenet',
                        in_channels= 3,
                        classes = 1,
                        activation=None)
    
  def forward(self,images, masks=None):
    logits = self.arc(images)

    if masks != None :
      loss1 =DiceLoss(mode='binary')(logits, masks)
      loss2 = nn.BCEWithLogitsLoss()(logits, masks)
      return logits, loss1+loss2
    
    return logits

modelPath = "D:/model.pt"
device = "cuda"#input("Enter device (cpu or cuda) : ")
model = SegmentationModel()
model.to(device);
model.load_state_dict(torch.load(modelPath,map_location=torch.device(device) ))
model.eval()

dummy_input = torch.randn(1,3,128,128,device=device)

torch.onnx.export(model,         # model being run 
        dummy_input,       # model input (or a tuple for multiple inputs) 
        "model.onnx",       # where to save the model  
        export_params=True,  # store the trained parameter weights inside the model file 
        do_constant_folding=True,  # whether to execute constant folding for optimization 
        input_names = ['modelInput'],   # the model's input names 
        output_names = ['modelOutput'], # the model's output names 
        dynamic_axes={'modelInput' : [0,2,3],    # variable length axes 
    

                    'modelOutput' : [0,2,3]})

崩溃的原因是什么？如何修复？

pytorch

来源：https://stackoverflow.com/questions/74434017/why-do-my-onnxruntime-inference-crash-on-gpu-without-any-log

1条答案

按热度按时间

erhoui1w1#

我发现了我的错误。我忘记下载这里提到的zlib：https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#prerequisites-windows
在我的环境变量PATH中添加zlibwapi.dll文件夹的路径后，一切都正常了。

赞(0）回复(0）举报 2022-11-23

我来回答

pytorch 为什么我的ONNXRuntime推理在GPU上崩溃，没有任何日志？

1条答案

相关问题

热门标签

最新问答