eval model:: 3% 10/300 [00:08<04:12, 1.15it/s]
--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0 paddle::imperative::Tracer::TraceOp(std::string const&, paddle::imperative::NameVarBaseMap const&, paddle::imperative::NameVarBaseMap const&, paddle::framework::AttributeMap, std::map<std::string, std::string, std::less<std::string >, std::allocator<std::pair<std::string const, std::string > > > const&)
1 paddle::imperative::Tracer::TraceOp(std::string const&, paddle::imperative::NameVarBaseMap const&, paddle::imperative::NameVarBaseMap const&, paddle::framework::AttributeMap, paddle::platform::Place const&, bool, std::map<std::string, std::string, std::less<std::string >, std::allocator<std::pair<std::string const, std::string > > > const&)
2 paddle::imperative::PreparedOp::Run(paddle::imperative::NameVarBaseMap const&, paddle::imperative::NameVarBaseMap const&, paddle::framework::AttributeMap const&, paddle::framework::AttributeMap const&)
3 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::CUDNNConvOpKernel<float>, paddle::operators::CUDNNConvOpKernel<double>, paddle::operators::CUDNNConvOpKernel<paddle::platform::float16> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&)
4 paddle::operators::CUDNNConvOpKernel<float>::Compute(paddle::framework::ExecutionContext const&) const
5 paddle::framework::Tensor::mutable_data(paddle::platform::Place const&, paddle::framework::proto::VarType_Type, unsigned long)
6 paddle::memory::AllocShared(paddle::platform::Place const&, unsigned long)
7 paddle::memory::allocation::AllocatorFacade::AllocShared(paddle::platform::Place const&, unsigned long)
8 paddle::memory::allocation::AllocatorFacade::Alloc(paddle::platform::Place const&, unsigned long)
9 paddle::memory::allocation::RetryAllocator::AllocateImpl(unsigned long)
10 paddle::memory::allocation::AutoGrowthBestFitAllocator::FreeIdleChunks()
----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
[TimeInfo:***Aborted at 1636257571 (unix time) try "date -d @1636257571" if you are using GNU date***]
[SignalInfo:***SIGSEGV (@0x28) received by PID 960 (TID 0x7f26d386d780) from PID 40***]
I don't know where the problem is, and I searched a lot of solutions above, but they couldn't solve it. Can you help me take a look?
26条答案
按热度按时间pbwdgjma1#
Thank you so much for your help! Nice to meet you!
6ojccjat2#
@dang-nh194423
There are no more debug info,I have no idea to find the real reason.
gc0ot86w3#
So, there is no solution for this error 😢
0ejtzxu14#
@dang-nh194423
maybe it is an illegal attempt to access not initialized tensor
gpfsuwkq5#
@GuoxiaWang
I use Google Colab so I don't push it to github.
Because I want to train the pretrained model. But I found only one page on how to do it. Did you need it, I will share to you
xiozqbni6#
@dang-nh194423
What's Repo?
tuwxkamq7#
@dang-nh194423
goucqfw68#
@GuoxiaWang
I only use 1 line below 😊
q1qsirdb9#
@dang-nh194423
Can you paste your code ?
yptwkmov10#
@GuoxiaWang
Yes, thank you. But what should I do now?
bvuwiixz11#
@dang-nh194423
indicates the error is happened when GPU memory alloc in Conv layer
34gzjxbg12#
@GuoxiaWang
I think it is not error, because when I evaluate with another model (Detection by MobileNet), no error happened. But when I use ResNet18, this error happened
beq87vna13#
@dang-nh194423
What error it is?
i1icjdpr14#
您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档、常见问题、历史Issue、AI社区来寻求解答。祝您生活愉快~
Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the API,FAQ,Github Issue and AI community to get the answer.Have a nice day!
lqfhib0f15#
I can't find the log file of evaluation. This folder has only train.log, I read train.log and I think the eval log file will be the same.
Because I got this error when I run evaluation the model