paddle_inference 推理库中 std::unique_ptr< paddle_infer::Predictor> Clone() 函数好像不是线程安全的

s71maibg 于 5个月前发布在其他

关注(0)|答案(5)|浏览(103)

请提出你的问题 Please ask your question

paddle_inference 推理库中 std::unique_ptr<paddle_infer::Predictor> Clone() 函数好像不是线程安全的
使用Predictor::Clone()函数拷贝出来的多个预测器同时推理的时候引擎报错，在CPU上推理运行
例如同时开10个线程，每个线程使用原始Predictor对象Clone出来的预测器进行推理，不是一开始就报错，而是有可能正常合成几轮语音，之后才出错

推理库版本：
GIT COMMIT ID: b031c38
WITH_MKL: OFF
WITH_MKLDNN: OFF
WITH_GPU: OFF
WITH_ROCM: OFF
CXX compiler version: 5.4.0

系统环境：Debian GNU/Linux 9
CPU ：Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
gcc version 6.3.0

Paddle

来源：https://github.com/PaddlePaddle/Paddle/issues/42607

5条答案

按热度按时间

fcg9iug31#

您好，我们已经收到了您的问题，会安排技术人员尽快解答您的问题，请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时，您也可以通过查看官网API文档、常见问题、历史Issue 、 AI社区来寻求解答。祝您生活愉快～

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the API ， FAQ ， Github Issue and AI community to get the answer.Have a nice day!

赞(0）回复(0）举报 5个月前

w8f9ii692#

提供一下报错日志吧，可以的附上单测更有利于我们分析问题

赞(0）回复(0）举报 5个月前

dfuffjeb3#

多次测试发现，Predictor::Clone()出来的多个预测器在多线程推理的时候，必须每个线程推理器的输入数据维度保持一致，不然会崩溃，例子和错误日志详见附件。错误日志： terminate called after throwing an instance of 'paddle::platform::EnforceNotMet' what(): Compile Traceback (most recent call last): File "07_synthesize_e2e_8k.py", line 238, in main() File "07_synthesize_e2e_8k.py", line 234, in main evaluate(args) File "07_synthesize_e2e_8k.py", line 155, in evaluate paddle.jit.save(am_inference, os.path.join(args.inference_dir, args.am)) File "", line 2, in save File "/home/zx_wangshuaixin/local/anaconda3/lib/python3.7/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in impl return wrapped_func(args, kwargs) File "/home/zx_wangshuaixin/local/anaconda3/lib/python3.7/site-packages/paddle/fluid/dygraph/base.py", line 40, in impl return func(args, kwargs) File "/home/zx_wangshuaixin/local/anaconda3/lib/python3.7/site-packages/paddle/fluid/dygraph/jit.py", line 744, in save inner_input_spec) File "/home/zx_wangshuaixin/local/anaconda3/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 517, in concrete_program_specify_input_spec desired_input_spec) File "/home/zx_wangshuaixin/local/anaconda3/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 427, in get_concrete_program concrete_program, partial_program_layer = self._program_cache[cache_key] File "/home/zx_wangshuaixin/local/anaconda3/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 744, in getitem self._caches[item] = self._build_once(item) File "/home/zx_wangshuaixin/local/anaconda3/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 735, in _build_once cache_key.kwargs) File "", line 2, in from_func_spec File "/home/zx_wangshuaixin/local/anaconda3/lib/python3.7/site-packa ***@. 发件人： feng_shuai 发送时间： 2022-05-09 19:42 收件人： PaddlePaddle/Paddle 抄送： Lqiuqiu; Author 主题： Re: [PaddlePaddle/Paddle] paddle_inference 推理库中 std::unique_ptr<paddle_infer::Predictor> Clone() 函数好像不是线程安全的 (Issue #42607) 提供一下报错日志吧，可以的附上单测更有利于我们分析问题 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@.***>

赞(0）回复(0）举报 5个月前

dkqlctbz4#

"必须每个线程推理器的输入数据维度保持一致，不然会崩溃"和你提的第一个问题是一个事情吗？第一个问题可能是内存相关问题，第二问题提供的日志看不出来原因，希望提供具体的单测给开发人员去定位。

赞(0）回复(0）举报 5个月前

enxuqcxy5#

是同一个问题，单侧程序如下，可根据单侧程序进行复现。 ================================================= #include <stdio.h> #include <stdlib.h> #include <string.h> #include #include #include #include #include #include #include <unistd.h> #include <pthread.h> #include "timer.h" #include "paddle/include/paddle_inference_api.h" using namespace std; using paddle_infer::Config; using paddle_infer::Predictor; using paddle_infer::CreatePredictor; //线程参数,每个线程一份 typedef struct task_ { //合成对象 std::shared_ptr pAcouModel; std::shared_ptr pVocoModel; //线程编号和合成文本 std::list<std::pair<int, std::string> > texts; //音频文件路径 std::string outDir; int idx; } Task; std::string& trim(std::string &s) { if (s.empty()) { return s; } s.erase(0,s.find_first_not_of(" ")); if(s.find_last_of("\r") != std::string::npos){ s.erase(s.find_last_of("\r")); }else if(s.find_last_of("\n") != std::string::npos){ s.erase(s.find_last_of("\n")); } return s; } //线程函数 void* proc(void *proc) { Task t = (Task)proc; std::shared_ptr m_acou = t->pAcouModel; std::shared_ptr m_voco = t->pVocoModel; int count = 0; pthread_t tid = pthread_self(); //每次从已经提前分配好的任务中获取一条任务 for(std::list<std::pair<int, std::string> >::iterator it = t->texts.begin(); it != t->texts.end(); ++it) { std::string line = it->second; //生成的音频文件的文件名，有两种方式，第1种文本里面都是等待合成的文本,第2种是文本里面每一行由文件名空格待合成的文本 char name[64]; sprintf(name, "%s/%08d.pcm", t->outDir.c_str(), it->first); //文本格式通过程序启动参数传入，如果文本包含文件名填1，否则填0 char text = strdup(line.c_str()); auto input_acou = m_acou->GetInputHandle("text"); auto output_acou = m_acou->GetOutputHandle("elementwise_add_41"); auto input_voco = m_voco->GetInputHandle("logmel"); auto output_voco = m_voco->GetOutputHandle("transpose_69.tmp_0"); std::vector input_data = {68, 152, 228, 65, 143, 122, 70, 141, 137, 145, 81, 168, 111, 235, 70, 166, 113, 152, 228, 65, 143, 122, 70, 141, 137, 145, 81, 168, 111, 235, 70, 166, 113, 152, 228, 65, 143, 122, 70, 141, 137, 145, 81, 168, 111, 235, 70, 166, 113, 152, 228, 65, 143, 122, 70, 141, 137, 145, 81, 168, 111, 235, 70, 166, 113, 152, 228, 65, 143, 122, 70, 141, 137, 145, 81, 168, 111, 235, 70, 166, 113, 152, 228, 65, 143, 122, 70, 141, 137, 145, 81, 168, 111, 235, 70, 166, 113, 152, 228, 65, 143, 122, 70, 141, 137, 145, 81, 168, 111, 235, 70, 166, 113, 152, 228, 65, 143, 122, 70, 141, 137, 145, 81, 168, 111, 235, 70, 166, 113}; std::vector input_shape = {t->idx + 5}; input_acou->Reshape(input_shape); input_acou->CopyFromCpu(input_data.data()); m_acou->Run(); std::vector output_shape = output_acou->shape(); std::vector out_data; out_data.resize(output_shape[0] * output_shape[1]); output_acou->CopyToCpu(out_data.data()); printf("acoustic model out shape: [%d][%d], thread id:[%d]\n", output_shape[0], output_shape[1], t->idx); input_voco->Reshape(output_shape); input_voco->CopyFromCpu(out_data.data()); m_voco->Run(); std::vector output_shape_2 = output_voco->shape(); std::vector output; output.resize(output_shape_2[0]); output_voco->CopyToCpu(output.data()); printf("vocoder model out shape: [%d][%d], thread id:[%d]\n", output_shape_2[0], output_shape_2[1], t->idx); } return NULL; } std::shared_ptr pModelLoad(const char modelPath, const char* weightPath) { Config config; config.SetModel(modelPath, weightPath); config.EnableMemoryOptim(); //开启内存/显存复用 //config.EnableMKLDNN(); //开启 MKLDNN 预测 //config.EnableUseGpu(100, 0); //开启 GPU 预测 return CreatePredictor(config); } std::shared_ptr getPredictor(const char *acouModelDir) { std::string path = acouModelDir; path += "/"; std::string modelParams = path + "model.pdiparams"; std::string modelStruct = path + "model.pdmodel"; auto pAcouModel = pModelLoad(modelStruct.c_str(), modelParams.c_str()); return pAcouModel; } int main(int argc, char argv[]) { if(argc < 6) { printf("%s acouModelDir vocoModelDir fileGBKencode threads outdir\n", argv[0]); return 1; } int threads = atoi(argv[4]); char outPath = argv[5]; std::shared_ptr pAcouModel = getPredictor(argv[1]); std::shared_ptr pVocoModel = getPredictor(argv[2]); Timer timer; //创建指定个数的线程,每个线程分配待合成文本中的条目 pthread_t id = new pthread_t[threads]; Task ts = new Task[threads]; for(int i = 0; i < threads; ++i) { ts[i].outDir = outPath; ts[i].pAcouModel = pAcouModel->Clone(); ts[i].pVocoModel = pVocoModel->Clone(); if (ts[i].pAcouModel == NULL || ts[i].pVocoModel == NULL) { threads = i; break; } ts[i].idx = i+1; } printf("init thread count=%d \n", threads); //读取待传入的gbk合成文本，每一行作为一个合成单位,分配给一个线程 std::ifstream is(argv[3]); int count = 0; int index = 0; std::string line; while(std::getline(is, line)) { ++count; ts[index].texts.push_back(std::make_pair(count, line)); ++index; if(index == threads) index = 0; } printf("create threads num=%d\n", threads); for(int i = 0; i < threads; ++i) { pthread_create(id + i, NULL, proc, ts + i); } for(int i = 0; i < threads; ++i) { pthread_join(id[i], NULL); } return 0; } ================================================== ***@. 发件人： feng_shuai 发送时间： 2022-05-16 16:50 收件人： PaddlePaddle/Paddle 抄送： Lqiuqiu; Author 主题： Re: [PaddlePaddle/Paddle] paddle_inference 推理库中 std::unique_ptr<paddle_infer::Predictor> Clone() 函数好像不是线程安全的 (Issue #42607) "必须每个线程推理器的输入数据维度保持一致，不然会崩溃"和你提的第一个问题是一个事情吗？第一个问题可能是内存相关问题，第二问题提供的日志看不出来原因，希望提供具体的单测给开发人员去定位。 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@.>

赞(0）回复(0）举报 5个月前

我来回答

paddle_inference 推理库中 std::unique_ptr< paddle_infer::Predictor> Clone() 函数好像不是线程安全的

请提出你的问题 Please ask your question

5条答案

相关问题

热门标签

最新问答