python 独角兽+ CUDA:无法在派生的子进程中重新初始化CUDA

pcww981p  于 2022-12-28  发布在  Python
关注(0)|答案(1)|浏览(392)

我正在创建一个带有torch、gunicorn和flask的推理服务,该服务应该使用CUDA。为了减少资源需求,我使用了gunicorn的preload选项,这样模型就可以在工作进程之间共享。然而,这会导致CUDA出现问题。下面的代码片段显示了一个最小的复制示例:

from flask import Flask, request
import torch

app = Flask('dummy')

model = torch.rand(500)
model = model.to('cuda:0')

@app.route('/', methods=['POST'])
def f():
    data = request.get_json()
    x = torch.rand((data['number'], 500))
    x = x.to('cuda:0')
    res = x * model
    return {
        "result": res.sum().item()
    }

使用CUDA_VISIBLE_DEVICES=1 gunicorn -w 3 -b $HOST_IP:8080 --preload run_server:app启动服务器可使服务成功启动。但是,在发出第一个请求(curl -X POST -d '{"number": 1}')后,工作线程将引发以下错误:

[2022-06-28 09:42:00,378] ERROR in app: Exception on / [POST]
Traceback (most recent call last):
  File "/home/user/.local/lib/python3.6/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/user/.local/lib/python3.6/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/user/.local/lib/python3.6/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/user/.local/lib/python3.6/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/home/user/.local/lib/python3.6/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/user/.local/lib/python3.6/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/user/project/run_server.py", line 14, in f
    x = x.to('cuda:0')
  File "/home/user/.local/lib/python3.6/site-packages/torch/cuda/__init__.py", line 195, in _lazy_init
    "Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

我在父进程中加载了模型,每个分叉的工作进程都可以访问它。在工作进程中创建CUDA支持的Tensor时出现了问题。这会在工作进程中重新初始化CUDA上下文,但由于它已经在父进程中初始化,因此会失败。如果我们设置x = data['number']并删除x = x.to('cuda:0'),则推理成功。
添加torch.multiprocessing.set_start_method('spawn')multiprocessing.set_start_method('spawn')不会改变任何东西,可能是因为gunicorn在使用--preload选项启动时肯定会使用fork
一个解决方案可能是不使用--preload选项,这会导致模型在内存/GPU中的多个副本。但这是我试图避免的。
有没有可能 * 不 * 在每个工作进程中单独加载模型就能解决这个问题?

pgvzfuti

pgvzfuti1#

你可以用gevent代替gunivorn,我用它解决了这个问题。

相关问题