Paddle FatalError: Segmentation fault

sd2nnvve  于 2021-11-30  发布在  Java
关注(0)|答案(26)|浏览(1214)
eval model::   3% 10/300 [00:08<04:12,  1.15it/s]

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   paddle::imperative::Tracer::TraceOp(std::string const&, paddle::imperative::NameVarBaseMap const&, paddle::imperative::NameVarBaseMap const&, paddle::framework::AttributeMap, std::map<std::string, std::string, std::less<std::string >, std::allocator<std::pair<std::string const, std::string > > > const&)
1   paddle::imperative::Tracer::TraceOp(std::string const&, paddle::imperative::NameVarBaseMap const&, paddle::imperative::NameVarBaseMap const&, paddle::framework::AttributeMap, paddle::platform::Place const&, bool, std::map<std::string, std::string, std::less<std::string >, std::allocator<std::pair<std::string const, std::string > > > const&)
2   paddle::imperative::PreparedOp::Run(paddle::imperative::NameVarBaseMap const&, paddle::imperative::NameVarBaseMap const&, paddle::framework::AttributeMap const&, paddle::framework::AttributeMap const&)
3   std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::CUDNNConvOpKernel<float>, paddle::operators::CUDNNConvOpKernel<double>, paddle::operators::CUDNNConvOpKernel<paddle::platform::float16> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&)
4   paddle::operators::CUDNNConvOpKernel<float>::Compute(paddle::framework::ExecutionContext const&) const
5   paddle::framework::Tensor::mutable_data(paddle::platform::Place const&, paddle::framework::proto::VarType_Type, unsigned long)
6   paddle::memory::AllocShared(paddle::platform::Place const&, unsigned long)
7   paddle::memory::allocation::AllocatorFacade::AllocShared(paddle::platform::Place const&, unsigned long)
8   paddle::memory::allocation::AllocatorFacade::Alloc(paddle::platform::Place const&, unsigned long)
9   paddle::memory::allocation::RetryAllocator::AllocateImpl(unsigned long)
10  paddle::memory::allocation::AutoGrowthBestFitAllocator::FreeIdleChunks()
----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
  [TimeInfo:***Aborted at 1636257571 (unix time) try "date -d @1636257571" if you are using GNU date***]
  [SignalInfo:***SIGSEGV (@0x28) received by PID 960 (TID 0x7f26d386d780) from PID 40***]

I don't know where the problem is, and I searched a lot of solutions above, but they couldn't solve it. Can you help me take a look?

k4aesqcs

k4aesqcs16#

So, how do you get this log. I need the full log, but you just paste the tail of log file.

pkbketx9

pkbketx917#

@GuoxiaWang
I'm so sorry but I pasted this code on Google Colab but it didn't display anything.
Thank you so much.

rbpvctlc

rbpvctlc18#

@dang-nh194423

Yes, but please open GLOG_v and C++ call stack.


# in your start python training script

import os
os.environ['GLOG_v']="3"
os.environ['FLAGS_call_stack_level']="2"
ugmeyewa

ugmeyewa19#

@GuoxiaWang
Is this the log?
eval.log

r7knjye2

r7knjye220#

@dang-nh194423

The log shows C++ runtime log.

Can you attach the full log file?

798qvoo8

798qvoo821#

@GuoxiaWang
I pasted this code and I tried to run again. But it still display this code below and I don't know what happened 

Streaming output truncated to the last 5000 lines.
I1108 00:19:12.636030   373 tracer.cc:209] No Grad to track for Op: adam
I1108 00:19:12.636104   373 tracer.cc:139] Trace Op: adam
I1108 00:19:12.636132   373 prepared_operator.cc:111] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I1108 00:19:12.636152   373 adam_op.cc:72] dims of Beta1Pow : [1]
I1108 00:19:12.636162   373 adam_op.cc:79] dims of Beta2Pow : [1]
I1108 00:19:12.636175   373 adam_op.cu:191] beta1_pow.numel() : 1beta2_pow.numel() : 1
I1108 00:19:12.636185   373 adam_op.cu:193] param.numel(): 512
I1108 00:19:12.636214   373 tracer.cc:209] No Grad to track for Op: adam
I1108 00:19:12.636284   373 tracer.cc:139] Trace Op: adam
I1108 00:19:12.636339   373 prepared_operator.cc:111] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I1108 00:19:12.636368   373 adam_op.cc:72] dims of Beta1Pow : [1]
I1108 00:19:12.636405   373 adam_op.cc:79] dims of Beta2Pow : [1]
I1108 00:19:12.636438   373 adam_op.cu:191] beta1_pow.numel() : 1beta2_pow.numel() : 1
I1108 00:19:12.636463   373 adam_op.cu:193] param.numel(): 512
I1108 00:19:12.636495   373 tracer.cc:209] No Grad to track for Op: adam
I1108 00:19:12.636613   373 tracer.cc:139] Trace Op: adam
I1108 00:19:12.636658   373 prepared_operator.cc:111] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I1108 00:19:12.636694   373 adam_op.cc:72] dims of Beta1Pow : [1]
I1108 00:19:12.636705   373 adam_op.cc:79] dims of Beta2Pow : [1]
I1108 00:19:12.636721   373 adam_op.cu:191] beta1_pow.numel() : 1beta2_pow.numel() : 1
I1108 00:19:12.636731   373 adam_op.cu:193] param.numel(): 2359296
I1108 00:19:12.636761   373 tracer.cc:209] No Grad to track for Op: adam
I1108 00:19:12.636880   373 tracer.cc:139] Trace Op: adam
I1108 00:19:12.636907   373 prepared_operator.cc:111] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I1108 00:19:12.636927   373 adam_op.cc:72] dims of Beta1Pow : [1]
I1108 00:19:12.636937   373 adam_op.cc:79] dims of Beta2Pow : [1]
I1108 00:19:12.636952   373 adam_op.cu:191] beta1_pow.numel() : 1beta2_pow.numel() : 1
I1108 00:19:12.636961   373 adam_op.cu:193] param.numel(): 512
I1108 00:19:12.636989   373 tracer.cc:209] No Grad to track for Op: adam
I1108 00:19:12.637060   373 tracer.cc:139] Trace Op: adam
I1108 00:19:12.637089   373 prepared_operator.cc:111] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I1108 00:19:12.637122   373 adam_op.cc:72] dims of Beta1Pow : [1]
I1108 00:19:12.637133   373 adam_op.cc:79] dims of Beta2Pow : [1]
I1108 00:19:12.637147   373 adam_op.cu:191] beta1_pow.numel() : 1beta2_pow.numel() : 1
I1108 00:19:12.637157   373 adam_op.cu:193] param.numel(): 512
I1108 00:19:12.637187   373 tracer.cc:209] No Grad to track for Op: adam
I1108 00:19:12.637259   373 tracer.cc:139] Trace Op: adam

Can you help me?
Thank you. 

6mzjoqzu

6mzjoqzu22#

It is to export environment variable in linux terminal.

You also can set by python


# GLOG_v means VLOG level

# FLAGS_call_stack_level means C++ call stack

import os
os.environ['GLOG_v']="3"
os.environ['FLAGS_call_stack_level']="2"
vjhs03f7

vjhs03f723#

@GuoxiaWang
Excuse me,
Can you explain more clearer. I'm using Google Colab for training the pretrained model of PPOCR.

export GLOG_v=3
export FLAGS_call_stack_level=2

This is the code? I tried to paste this code into my notebook. But it didn't display anything!
Thank you.

krugob8w

krugob8w24#

@dang-nh194423

Please use the VLOG to get more info:

export GLOG_v=3
export FLAGS_call_stack_level=2
And you can put your code below.

js81xvg6

js81xvg625#

@GuoxiaWang Yes, I'm here

polhcujo

polhcujo26#

@dang-nh194423

Please use the VLOG to get more info:

export GLOG_v=3
export FLAGS_call_stack_level=2

And you can put your code below.

相关问题