版本、环境信息：

Paddle version: 2.0.0-rc0
Paddle With CUDA: True
OS: Windows 10
Python version: 3.7.0
CUDA version: 10.2.89
cuDNN version: 7.6.5
Nvidia driver version: 457.09

训练信息

1）单机单卡
2）显存信息：显卡NVIDIA GeForce RTX3070 8.0GB

利用网络 backbone 为ResNet50-vd-FPN-Dcnv2，网络类型为Cascade Faster，配置文件如下：
architecture: CascadeRCNN
max_iters: 30000
snapshot_iter: 3000
use_gpu: true
log_smooth_window: 20
log_iter: 20
save_dir: output
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_v2_pretrained.tar
weights: output/cascade_rcnn_dcn_r50_vd_fpn_gen_server_side_traffic4/model_final
metric: VOC
num_classes: 5

CascadeRCNN:
backbone: ResNet
fpn: FPN
rpn_head: FPNRPNHead
roi_extractor: FPNRoIAlign
bbox_head: CascadeBBoxHead
bbox_assigner: CascadeBBoxAssigner

ResNet:
norm_type: bn
depth: 50
feature_maps: [2, 3, 4, 5]
freeze_at: 2
variant: d
dcn_v2_stages: [3, 4, 5]
lr_mult_list: [0.05, 0.05, 0.1, 0.15]

FPN:
max_level: 6
min_level: 2
num_chan: 64
spatial_scale: [0.03125, 0.0625, 0.125, 0.25]

FPNRPNHead:
anchor_generator:
anchor_sizes: [32, 64, 128, 256, 512]
aspect_ratios: [0.5, 1.0, 2.0]
stride: [16.0, 16.0]
variance: [1.0, 1.0, 1.0, 1.0]
anchor_start_size: 32
min_level: 2
max_level: 6
num_chan: 64
rpn_target_assign:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_positive_overlap: 0.7
rpn_negative_overlap: 0.3
rpn_straddle_thresh: 0.0
train_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 2000
post_nms_top_n: 2000
test_proposal:
min_size: 0.0
nms_thresh: 0.7
pre_nms_top_n: 500
post_nms_top_n: 300

FPNRoIAlign:
canconical_level: 4
canonical_size: 224
min_level: 2
max_level: 5
box_resolution: 7
sampling_ratio: 2

CascadeBBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights: [10, 20, 30]
bg_thresh_lo: [0.0, 0.0, 0.0]
bg_thresh_hi: [0.5, 0.6, 0.7]
fg_thresh: [0.5, 0.6, 0.7]
fg_fraction: 0.25

CascadeBBoxHead:
head: CascadeTwoFCHead
bbox_loss: BalancedL1Loss
nms:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05

BalancedL1Loss:
alpha: 0.5
gamma: 1.5
beta: 1.0
loss_weight: 1.0

CascadeTwoFCHead:
mlp_dim: 1024

LearningRate:
base_lr: 0.0000125
schedulers:

!PiecewiseDecay

gamma: 0.1
milestones: [24000, 26000]

!LinearWarmup

start_factor: 0.1
steps: 1000

OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2

TrainReader:
inputs_def:
fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd']
dataset:
!VOCDataSet
anno_path: train.txt
dataset_dir: dataset/traffic_light4
use_default_label: false
sample_transforms:

!DecodeImage

to_rgb: true

!RandomFlipImage

prob: 0.5

!AutoAugmentImage

autoaug_type: v1

!NormalizeImage

is_channel_first: false
is_scale: true
mean: [0.485,0.456,0.406]
std: [0.229, 0.224,0.225]

!ResizeImage

target_size: [640, 672, 704, 736, 768, 800, 832, 864, 896, 928, 960, 992, 1024]
max_size: 1500
interp: 1
use_cv2: true

!Permute

to_bgr: false
channel_first: true
batch_transforms:

!PadBatch

pad_to_stride: 32
use_padded_im_info: false
batch_size: 2
shuffle: true
worker_num: 2
use_process: false

EvalReader:
inputs_def:
#fields: ['image', 'im_info', 'im_id', 'im_shape']

for voc

fields: ['image', 'im_info', 'im_id','im_shape', 'gt_bbox', 'gt_class', 'is_difficult']
dataset:
!VOCDataSet
anno_path: val.txt
dataset_dir: dataset/traffic_light4
use_default_label: false
sample_transforms:

!DecodeImage

to_rgb: true
with_mixup: false

!NormalizeImage

is_channel_first: false
is_scale: true
mean: [0.485,0.456,0.406]
std: [0.229, 0.224,0.225]

!ResizeImage

interp: 1
max_size: 1500
target_size: 1000
use_cv2: true

!Permute

channel_first: true
to_bgr: false
batch_transforms:

!PadBatch

pad_to_stride: 32
use_padded_im_info: true
batch_size: 1
shuffle: false
drop_empty: false
worker_num: 2

TestReader:
inputs_def:

set image_shape if needed

fields: ['image', 'im_info', 'im_id', 'im_shape']
dataset:
!ImageFolder
use_default_label: false
with_background: true
anno_path: dataset/traffic_light4/label_list.txt
sample_transforms:

!DecodeImage

to_rgb: true
with_mixup: false

!NormalizeImage

is_channel_first: false
is_scale: true
mean: [0.485,0.456,0.406]
std: [0.229, 0.224,0.225]

!ResizeImage

interp: 1
max_size: 1500
target_size: 1000
use_cv2: true

!Permute

channel_first: true
to_bgr: false
batch_transforms:

!PadBatch

pad_to_stride: 32
use_padded_im_info: true
batch_size: 1
shuffle: false

其中label_list.txt中有4个类别，训练数据有20007条，均标记好了目标位置，且转化为xml文件。
训练时出现的信息为：
C:\ProgramData\Anaconda3\envs\pp\lib\site-packages\paddle\fluid\layers\math_op_patch.py:278: UserWarning: D:\lyx\PaddleDetection-release-0.5\ppdet\modeling\backbones\fpn.py:108
The behavior of expression A + B has been unified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
op_type, op_type, EXPRESSION_MAP[method_name]))
C:\ProgramData\Anaconda3\envs\pp\lib\site-packages\paddle\fluid\layers\math_op_patch.py:278: UserWarning: D:\lyx\PaddleDetection-release-0.5\ppdet\modeling\losses\balanced_l1_loss.py:56
The behavior of expression A - B has been unified with elementwise_sub(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_sub(X, Y, axis=0) instead of A - B. This transitional warning will be dropped in the future.
op_type, op_type, EXPRESSION_MAP[method_name]))
C:\ProgramData\Anaconda3\envs\pp\lib\site-packages\paddle\fluid\layers\math_op_patch.py:278: UserWarning: D:\lyx\PaddleDetection-release-0.5\ppdet\modeling\losses\balanced_l1_loss.py:65
The behavior of expression A * B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A * B. This transitional warning will be dropped in the future.
op_type, op_type, EXPRESSION_MAP[method_name]))
C:\ProgramData\Anaconda3\envs\pp\lib\site-packages\paddle\fluid\layers\math_op_patch.py:278: UserWarning: D:\lyx\PaddleDetection-release-0.5\ppdet\modeling\losses\balanced_l1_loss.py:66
The behavior of expression A - B has been unified with elementwise_sub(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_sub(X, Y, axis=0) instead of A - B. This transitional warning will be dropped in the future.
op_type, op_type, EXPRESSION_MAP[method_name]))
C:\ProgramData\Anaconda3\envs\pp\lib\site-packages\paddle\fluid\layers\math_op_patch.py:278: UserWarning: D:\lyx\PaddleDetection-release-0.5\ppdet\modeling\losses\balanced_l1_loss.py:66
The behavior of expression A * B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A * B. This transitional warning will be dropped in the future.
op_type, op_type, EXPRESSION_MAP[method_name]))
C:\ProgramData\Anaconda3\envs\pp\lib\site-packages\paddle\fluid\layers\math_op_patch.py:278: UserWarning: D:\lyx\PaddleDetection-release-0.5\ppdet\modeling\losses\balanced_l1_loss.py:67
The behavior of expression A * B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A * B. This transitional warning will be dropped in the future.
op_type, op_type, EXPRESSION_MAP[method_name]))
C:\ProgramData\Anaconda3\envs\pp\lib\site-packages\paddle\fluid\layers\math_op_patch.py:278: UserWarning: D:\lyx\PaddleDetection-release-0.5\ppdet\modeling\losses\balanced_l1_loss.py:70
The behavior of expression A * B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A * B. This transitional warning will be dropped in the future.
op_type, op_type, EXPRESSION_MAP[method_name]))
C:\ProgramData\Anaconda3\envs\pp\lib\site-packages\paddle\fluid\layers\math_op_patch.py:278: UserWarning: D:\lyx\PaddleDetection-release-0.5\ppdet\modeling\losses\balanced_l1_loss.py:71
The behavior of expression A + B has been unified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
op_type, op_type, EXPRESSION_MAP[method_name]))
C:\ProgramData\Anaconda3\envs\pp\lib\site-packages\paddle\fluid\layers\math_op_patch.py:278: UserWarning: D:\lyx\PaddleDetection-release-0.5\ppdet\modeling\losses\balanced_l1_loss.py:71
The behavior of expression A * B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A * B. This transitional warning will be dropped in the future.
op_type, op_type, EXPRESSION_MAP[method_name]))
2020-12-22 18:38:04,668-INFO: If regularizer of a Parameter has been set by 'fluid.ParamAttr' or 'fluid.WeightNormParamAttr' already. The Regularization[L2Decay, regularization_coeff=0.000100] in Optimizer will not take effect, and it will only be applied to other Parameters!
C:\ProgramData\Anaconda3\envs\pp\lib\site-packages\paddle\fluid\layers\math_op_patch.py:278: UserWarning: D:\lyx\PaddleDetection-release-0.5\ppdet\modeling\roi_heads\cascade_head.py:199
The behavior of expression A + B has been unified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
op_type, op_type, EXPRESSION_MAP[method_name]))
C:\ProgramData\Anaconda3\envs\pp\lib\site-packages\paddle\fluid\layers\math_op_patch.py:278: UserWarning: D:\lyx\PaddleDetection-release-0.5\ppdet\modeling\roi_heads\cascade_head.py:206
The behavior of expression A / B has been unified with elementwise_div(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_div(X, Y, axis=0) instead of A / B. This transitional warning will be dropped in the future.
op_type, op_type, EXPRESSION_MAP[method_name]))
W1222 18:38:23.042533 10816 device_context.cc:338] Please NOTE: device: 0, CUDA Capability: 86, Driver API Version: 11.1, Runtime API Version: 10.2
W1222 18:38:23.126313 10816 device_context.cc:346] device: 0, cuDNN Version: 7.6.
C:\ProgramData\Anaconda3\envs\pp\lib\site-packages\paddle\fluid\io.py:2110: UserWarning: This list is not set, Because of Paramerter not found in program. There are: fc_0.b_0 fc_0.w_0
format(" ".join(unused_para_list)))
W1222 18:59:18.041463 10816 build_strategy.cc:170] fusion_group is not enabled for Windows/MacOS now, and only effective when running with CUDA GPU.
D:\lyx\PaddleDetection-release-0.5\ppdet\data\reader.py:90: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
if isinstance(item, collections.Sequence) and len(item) == 0:
D:\lyx\PaddleDetection-release-0.5\ppdet\data\transform\autoaugment_utils.py:1461: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() or inspect.getfullargspec()
if 'replace' in inspect.getargspec(func)[0]:
D:\lyx\PaddleDetection-release-0.5\ppdet\data\transform\autoaugment_utils.py:1463: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() or inspect.getfullargspec()
assert 'replace' == inspect.getargspec(func)[0][-1]
D:\lyx\PaddleDetection-release-0.5\ppdet\data\transform\autoaugment_utils.py:1468: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() or inspect.getfullargspec()
if 'bboxes' not in inspect.getargspec(func)[0]:
D:\lyx\PaddleDetection-release-0.5\ppdet\data\transform\autoaugment_utils.py:1456: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() or inspect.getfullargspec()
if 'prob' in inspect.getargspec(func)[0]:
D:\lyx\PaddleDetection-release-0.5\ppdet\data\transform\autoaugment_utils.py:1476: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() or inspect.getfullargspec()
assert 'bboxes' == inspect.getargspec(func)[0][1]
D:\lyx\PaddleDetection-release-0.5\ppdet\data\transform\autoaugment_utils.py:1480: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() or inspect.getfullargspec()
if 'prob' in inspect.getargspec(func)[0]:
tools/train.py:262: RuntimeWarning: divide by zero encountered in double_scalars
ips = float(cfg['TrainReader']['batch_size']) / time_cost
2020-12-22 18:59:20,002-INFO: iter: 0, lr: 0.000001, 'loss_cls_0': '7.239295', 'loss_loc_0': '0.004803', 'loss_cls_1': '2.924091', 'loss_loc_1': '0.001301', 'loss_cls_2': '2.735808', 'loss_loc_2': '0.001601', 'loss_rpn_cls': '30.671740', 'loss_rpn_bbox': '0.021397', 'loss': '43.600033', eta: 0:00:00, batch_cost: 0.00000 sec, ips: inf images/sec
2020-12-22 18:59:32,541-INFO: iter: 20, lr: 0.000001, 'loss_cls_0': '1.609347', 'loss_loc_0': '0.000000', 'loss_cls_1': '0.804670', 'loss_loc_1': '0.000000', 'loss_cls_2': '0.402333', 'loss_loc_2': '0.000000', 'loss_rpn_cls': 'nan', 'loss_rpn_bbox': 'nan', 'loss': 'nan', eta: 5:44:36, batch_cost: 0.68969 sec, ips: 2.89985 images/sec
2020-12-22 18:59:45,374-INFO: iter: 40, lr: 0.000002, 'loss_cls_0': '1.608831', 'loss_loc_0': '0.000000', 'loss_cls_1': '0.804448', 'loss_loc_1': '-0.000000', 'loss_cls_2': '0.402221', 'loss_loc_2': '0.000000', 'loss_rpn_cls': 'nan', 'loss_rpn_bbox': 'nan', 'loss': 'nan', eta: 5:23:35, batch_cost: 0.64804 sec, ips: 3.08623 images/sec
2020-12-22 18:59:57,319-INFO: iter: 60, lr: 0.000002, 'loss_cls_0': '1.608098', 'loss_loc_0': '0.000000', 'loss_cls_1': '0.804157', 'loss_loc_1': '-0.000000', 'loss_cls_2': '0.402075', 'loss_loc_2': '0.000000', 'loss_rpn_cls': 'nan', 'loss_rpn_bbox': 'nan', 'loss': 'nan', eta: 4:56:07, batch_cost: 0.59344 sec, ips: 3.37016 images/sec
2020-12-22 19:00:08,984-INFO: iter: 80, lr: 0.000002, 'loss_cls_0': '1.607245', 'loss_loc_0': '0.000000', 'loss_cls_1': '0.803822', 'loss_loc_1': '0.000000', 'loss_cls_2': '0.401908', 'loss_loc_2': '0.000000', 'loss_rpn_cls': 'nan', 'loss_rpn_bbox': 'nan', 'loss': 'nan', eta: 4:56:26, batch_cost: 0.59448 sec, ips: 3.36430 images/sec
2020-12-22 19:00:20,621-INFO: iter: 100, lr: 0.000002, 'loss_cls_0': '1.606289', 'loss_loc_0': '0.000000', 'loss_cls_1': '0.803448', 'loss_loc_1': '0.000000', 'loss_cls_2': '0.401720', 'loss_loc_2': '0.000000', 'loss_rpn_cls': 'nan', 'loss_rpn_bbox': 'nan', 'loss': 'nan', eta: 4:46:22, batch_cost: 0.57468 sec, ips: 3.48020 images/sec
2020-12-22 19:00:33,152-INFO: iter: 120, lr: 0.000003, 'loss_cls_0': '1.605234', 'loss_loc_0': '0.000000', 'loss_cls_1': '0.803034', 'loss_loc_1': '0.000000', 'loss_cls_2': '0.401513', 'loss_loc_2': '0.000000', 'loss_rpn_cls': 'nan', 'loss_rpn_bbox': 'nan', 'loss': 'nan', eta: 5:11:24, batch_cost: 0.62533 sec, ips: 3.19833 images/sec
2020-12-22 19:00:46,113-INFO: iter: 140, lr: 0.000003, 'loss_cls_0': '1.604078', 'loss_loc_0': '0.000000', 'loss_cls_1': '0.802581', 'loss_loc_1': '0.000000', 'loss_cls_2': '0.401287', 'loss_loc_2': '0.000000', 'loss_rpn_cls': 'nan', 'loss_rpn_bbox': 'nan', 'loss': 'nan', eta: 5:24:37, batch_cost: 0.65229 sec, ips: 3.06613 images/sec
2020-12-22 19:00:59,035-INFO: iter: 160, lr: 0.000003, 'loss_cls_0': '1.602824', 'loss_loc_0': '0.000000', 'loss_cls_1': '0.802090', 'loss_loc_1': '0.000000', 'loss_cls_2': '0.401041', 'loss_loc_2': '0.000000', 'loss_rpn_cls': 'nan', 'loss_rpn_bbox': 'nan', 'loss': 'nan', eta: 5:19:38, batch_cost: 0.64272 sec, ips: 3.11175 images/sec
————————————————————————————————————————————————————————
从第20个iter就出现loss：nan，同样的数据集和配置文件，尝试在其他机器上如window平台下，gtc2080TI显卡上训练并没有出现这种情况。lr已经尝试从0.00125-0.0000125都出现这种状况，而且尝试其他的网络架构如ppyolo也出现同样的问题，希望大神们可用帮忙解决一下，谢谢~

3条答案

按热度按时间

pgx2nnw81#

您好，我们已经收到了您的问题，会安排技术人员尽快解答您的问题，请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时，您也可以通过查看官网API文档、常见问题、历史Issue、AI社区来寻求解答。祝您生活愉快～

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the API，FAQ，Github Issue and AI community to get the answer.Have a nice day!

赞(0）回复(0）举报 2021-12-07

lnxxn5zx2#

可能是数据读取导致，比如标签对应关系错误。还可以检查下网络中是否会出现除0，log0的操作等

uajslkp63#

谢谢回复~我再仔细的检查我的数据集，并没有发现标签对应关系的错误，我也拿该数据集在另一台同为win10的单卡单机，显卡为GTX 1060 6.0GB的机器上进行训练，其中配置文件除了lr改为0.00125以外其余与之前的相同，其训练输出如下：
2020-12-23 10:21:43,046-INFO: If regularizer of a Parameter has been set by 'fluid.ParamAttr' or 'fluid.WeightNormParamAttr' already. The Regularization[L2Decay, regularization_coeff=0.000100] in Optimizer will not take effect, and it will only be applied to other Parameters!
2020-12-23 10:21:49,138-INFO: places would be ommited when DataLoader is not iterable
2020-12-23 10:22:10,298-WARNING: C:\Users\Fundway/.cache/paddle/weights\ResNet50_vd_ssld_v2_pretrained.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ]
C:\Users\Fundway\AppData\Local\conda\conda\envs\testpp\lib\site-packages\paddle\fluid\io.py:1998: UserWarning: This list is not set, Because of Paramerter not found in program. There are: fc_0.b_0 fc_0.w_0
format(" ".join(unused_para_list)))
2020-12-23 10:22:53,611-INFO: places would be ommited when DataLoader is not iterable
2020-12-23 10:22:58,113-INFO: iter: 0, lr: 0.000125, 'loss_cls_0': '1.596302', 'loss_loc_0': '0.000002', 'loss_cls_1': '0.768629', 'loss_loc_1': '0.000000', 'loss_cls_2': '0.394244', 'loss_loc_2': '0.000000', 'loss_rpn_cls': '0.695158', 'loss_rpn_bbox': '0.008708', 'loss': '3.463043', time: 0.000, eta: 0:00:00
2020-12-23 10:23:24,056-INFO: iter: 20, lr: 0.000148, 'loss_cls_0': '1.253679', 'loss_loc_0': '0.000003', 'loss_cls_1': '0.597673', 'loss_loc_1': '0.000000', 'loss_cls_2': '0.311132', 'loss_loc_2': '0.000000', 'loss_rpn_cls': '0.694021', 'loss_rpn_bbox': '0.010990', 'loss': '2.863490', time: 1.449, eta: 8 days, 9:18:20
2020-12-23 10:23:49,095-INFO: iter: 40, lr: 0.000170, 'loss_cls_0': '0.021264', 'loss_loc_0': '0.000019', 'loss_cls_1': '0.001989', 'loss_loc_1': '0.000000', 'loss_cls_2': '0.001103', 'loss_loc_2': '0.000000', 'loss_rpn_cls': '0.691823', 'loss_rpn_bbox': '0.010662', 'loss': '0.736680', time: 1.262, eta: 7 days, 7:24:05
2020-12-23 10:24:15,180-INFO: iter: 60, lr: 0.000193, 'loss_cls_0': '0.018982', 'loss_loc_0': '0.000025', 'loss_cls_1': '0.000775', 'loss_loc_1': '0.000000', 'loss_cls_2': '0.000427', 'loss_loc_2': '0.000000', 'loss_rpn_cls': '0.681162', 'loss_rpn_bbox': '0.008746', 'loss': '0.709770', time: 1.282, eta: 7 days, 10:02:55
2020-12-23 10:24:39,396-INFO: iter: 80, lr: 0.000215, 'loss_cls_0': '0.058617', 'loss_loc_0': '0.000007', 'loss_cls_1': '0.017247', 'loss_loc_1': '0.000000', 'loss_cls_2': '0.005145', 'loss_loc_2': '0.000000', 'loss_rpn_cls': '0.661259', 'loss_rpn_bbox': '0.009319', 'loss': '0.757371', time: 1.264, eta: 7 days, 7:38:40
2020-12-23 10:25:07,956-INFO: iter: 100, lr: 0.000238, 'loss_cls_0': '0.044691', 'loss_loc_0': '0.000004', 'loss_cls_1': '0.007914', 'loss_loc_1': '0.000000', 'loss_cls_2': '0.003082', 'loss_loc_2': '0.000000', 'loss_rpn_cls': '0.625538', 'loss_rpn_bbox': '0.009820', 'loss': '0.694501', time: 1.413, eta: 8 days, 4:20:26
2020-12-23 10:25:28,858-INFO: iter: 120, lr: 0.000260, 'loss_cls_0': '0.036790', 'loss_loc_0': '0.000003', 'loss_cls_1': '0.002068', 'loss_loc_1': '0.000000', 'loss_cls_2': '0.000903', 'loss_loc_2': '0.000000', 'loss_rpn_cls': '0.510809', 'loss_rpn_bbox': '0.009324', 'loss': '0.575176', time: 1.053, eta: 6 days, 2:19:39
2020-12-23 10:25:48,041-INFO: iter: 140, lr: 0.000282, 'loss_cls_0': '0.045195', 'loss_loc_0': '0.000003', 'loss_cls_1': '0.000785', 'loss_loc_1': '0.000000', 'loss_cls_2': '0.000391', 'loss_loc_2': '0.000000', 'loss_rpn_cls': '0.289394', 'loss_rpn_bbox': '0.013347', 'loss': '0.351146', time: 0.938, eta: 5 days, 10:19:47
————————————————————————————————————————————————————————
这么看起来好像数据集并没有问题，那我在RTX2070机器上出现NAN的问题是我在网络配置上有什么错误吗？

Paddle lr已经设到很小了，总是在第20个iter出现loss:NAN

for voc

set image_shape if needed

3条答案

相关问题

热门标签

最新问答