请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem
- 系统环境/System Environment:docker
- 版本号/Version:Paddle:paddleocr-release-2.5 PaddleOCR: 问题相关组件/Related components:paddle版本为2.1.0
- 运行指令/Command Code:
- 完整报错/Complete Error Message:无报错
- yml文件配置如下:
Global:
debug: false
use_gpu: true
epoch_num: 100
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_model/ppocrv3_en/
save_epoch_step: 1
eval_batch_step: [13540, 2708]
cal_metric_during_train: true
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: True
infer_img: doc/imgs_words/ch/word_1.jpg
character_dict_path: ./rec_dataset/alphabet.txt
max_text_length: &max_text_length 150
infer_mode: false
use_space_char: true
distributed: true
save_res_path: ./output/ppocrv3_en/predicts_ppocrv3_en.txt
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.001
warmup_epoch: 5
regularizer:
name: L2
factor: 3.0e-05
Architecture:
model_type: rec
algorithm: SVTR
Transform:
Backbone:
name: MobileNetV1Enhance
scale: 0.5
last_conv_stride: [1, 2]
last_pool_type: avg
Head:
name: MultiHead
head_list:
- CTCHead:
Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001 - SARHead:
enc_dim: 512
max_text_length: *max_text_length
Loss:
name: MultiLoss
loss_config_list:
- CTCLoss:
- SARLoss:
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
ignore_space: False
Train:
dataset:
name: SimpleDataSet
data_dir: ./rec_dataset/
ext_op_transform_idx: 1
label_file_list:
- ./rec_dataset/en_train.txt
transforms: - DecodeImage:
img_mode: BGR
channel_first: false - RecConAug:
prob: 0.5
ext_data_num: 2
image_shape: [32, 640, 3] #[48, 320, 3] - RecAug:
- MultiLabelEncode:
- RecResizeImg:
image_shape: [3, 32, 640] #[3, 48, 320] - KeepKeys:
keep_keys: - image
- label_ctc
- label_sar
- length
- valid_ratio
loader:
shuffle: true
batch_size_per_card: 64
drop_last: true
num_workers: 4
use_shared_memory: False
Eval:
dataset:
name: SimpleDataSet
data_dir: ./rec_dataset/
label_file_list:
-. /rec_dataset/en_val.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false - MultiLabelEncode:
- RecResizeImg:
image_shape: [3, 32, 640] #[3, 48, 320] - KeepKeys:
keep_keys: - image
- label_ctc
- label_sar
- length
- valid_ratio
loader:
shuffle: false
drop_last: false
batch_size_per_card: 64
num_workers: 4
use_shared_memory: False
- 9个epoch训练精度如下:
[2022/07/18 09:11:28] ppocr INFO: cur metric, acc: 0.086679999982664, norm_edit_dis: 0.7975049071977204, fps: 1035.472484016947
[2022/07/18 09:11:30] ppocr INFO: save best model is to ./output/rec_model/ppocrv3_en_0718/best_accuracy
[2022/07/18 09:11:30] ppocr INFO: best metric, acc: 0.086679999982664, norm_edit_dis: 0.7975049071977204, fps: 1035.472484016947, best_epoch: 8
[2022/07/18 09:11:33] ppocr INFO: save model in ./output/rec_model/ppocrv3_en_0718/latest
[2022/07/18 09:11:35] ppocr INFO: save model in ./output/rec_model/ppocrv3_en_0718/iter_epoch_8
[2022/07/18 09:11:46] ppocr INFO: epoch: [9/100], global_step: 21670, lr: 0.000998, acc: 0.031250, norm_edit_dis: 0.727142, CTCLoss: 61.790436, SARLoss: 0.804682, loss: 62.581017, avg_reader_cost: 1.03217 s, avg_batch_cost: 1.58362 s, avg_samples: 38.4, ips: 24.24820 samples/s, eta: 2 days, 10:54:21
[2022/07/18 09:11:54] ppocr INFO: epoch: [9/100], global_step: 21680, lr: 0.000998, acc: 0.062500, norm_edit_dis: 0.731419, CTCLoss: 63.198486, SARLoss: 0.815200, loss: 64.027908, avg_reader_cost: 0.00131 s, avg_batch_cost: 0.73033 s, avg_samples: 64.0, ips: 87.63198 samples/s, eta: 2 days, 10:53:59
[2022/07/18 09:12:01] ppocr INFO: epoch: [9/100], global_step: 21690, lr: 0.000998, acc: 0.062500, norm_edit_dis: 0.727574, CTCLoss: 62.317360, SARLoss: 0.813799, loss: 63.136574, avg_reader_cost: 0.00041 s, avg_batch_cost: 0.73126 s, avg_samples: 64.0, ips: 87.51965 samples/s, eta: 2 days, 10:53:36
[2022/07/18 09:12:08] ppocr INFO: epoch: [9/100], global_step: 21700, lr: 0.000998, acc: 0.046875, norm_edit_dis: 0.725965, CTCLoss: 61.712959, SARLoss: 0.790435, loss: 62.543045, avg_reader_cost: 0.00055 s, avg_batch_cost: 0.73522 s, avg_samples: 64.0, ips: 87.04859 samples/s, eta: 2 days, 10:53:14
[2022/07/18 09:12:16] ppocr INFO: epoch: [9/100], global_step: 21710, lr: 0.000998, acc: 0.039062, norm_edit_dis: 0.730681, CTCLoss: 60.901382, SARLoss: 0.780803, loss: 61.697411, avg_reader_cost: 0.00055 s, avg_batch_cost: 0.72823 s, avg_samples: 64.0, ips: 87.88444 samples/s, eta: 2 days, 10:52:52
[2022/07/18 09:12:23] ppocr INFO: epoch: [9/100], global_step: 21720, lr: 0.000998, acc: 0.046875, norm_edit_dis: 0.735047, CTCLoss: 58.004299, SARLoss: 0.789746, loss: 58.785194, avg_reader_cost: 0.00139 s, avg_batch_cost: 0.73808 s, avg_samples: 64.0, ips: 86.71122 samples/s, eta: 2 days, 10:52:30
[2022/07/18 09:12:30] ppocr INFO: epoch: [9/100], global_step: 21730, lr: 0.000998, acc: 0.046875, norm_edit_dis: 0.736219, CTCLoss: 58.464737, SARLoss: 0.791309, loss: 59.240974, avg_reader_cost: 0.00053 s, avg_batch_cost: 0.74419 s, avg_samples: 64.0, ips: 85.99914 samples/s, eta: 2 days, 10:52:10
[2022/07/18 09:12:38] ppocr INFO: epoch: [9/100], global_step: 21740, lr: 0.000998, acc: 0.031250, norm_edit_dis: 0.732627, CTCLoss: 62.068687, SARLoss: 0.780804, loss: 62.813091, avg_reader_cost: 0.00057 s, avg_batch_cost: 0.77975 s, avg_samples: 64.0, ips: 82.07759 samples/s, eta: 2 days, 10:51:53
经过9个epoch的训练,准确率不到0.1,字符编辑距离都0.7了,总感觉有问题,相比较CRNN训练,这个训练差太多,请问配置文件设置错误了吗?还是需要修改其他什么?
3条答案
按热度按时间qojgxg4l1#
CTC loss波动比较大,可能的问题是目前的shape=[3, 32, 640], GTC策略中CTC分支单独优化,梯度不回传,过长文本场景可能不适用。 建议去除GTC策略,单独使用 LCNet_SVTR 进行训练。可以参考这个issue修改配置文件: #6355
pgpifvop2#
CTC loss波动比较大,可能的问题是目前的shape=[3, 32, 640], GTC策略中CTC分支单独优化,梯度不回传,过长文本场景可能不适用。 建议去除GTC策略,单独使用 LCNet_SVTR 进行训练。可以参考这个issue修改配置文件: #6355
en_PP-OCRv3_rec.yml配置文件部分修改为:
Neck:
name: SequenceEncoder
encoder_type: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: False
Head:
name: CTCHead
fc_decay: 0.00001
Loss:
name: CTCLoss
经过75个epoch的训练,精度如下:
[2022/07/21 06:28:15] ppocr INFO: cur metric, acc: 0.572419999885516, norm_edit_dis: 0.9788205213278756, fps: 952.368972133215
[2022/07/21 06:28:15] ppocr INFO: best metric, acc: 0.578679999884264, norm_edit_dis: 0.978377395575744, fps: 1000.5441859767002, best_epoch: 69
[2022/07/21 06:28:17] ppocr INFO: save model in ./output/rec_model_2022/ppocrv3_en_0720/latest
[2022/07/21 06:28:17] ppocr INFO: save model in ./output/rec_model_2022/ppocrv3_en_0720/iter_epoch_74
[2022/07/21 06:28:45] ppocr INFO: epoch: [75/500], global_step: 100200, lr: 0.000954, acc: 0.351562, norm_edit_dis: 0.939163, loss: 12.896205, avg_reader_cost: 2.62400 s, avg_batch_cost: 3.03456 s, avg_samples: 51.2, ips: 16.87232 samples/s, eta: 5 days, 11:19:27
[2022/07/21 06:28:52] ppocr INFO: epoch: [75/500], global_step: 100210, lr: 0.000954, acc: 0.347656, norm_edit_dis: 0.937355, loss: 12.740088, avg_reader_cost: 0.00044 s, avg_batch_cost: 0.67201 s, avg_samples: 128.0, ips: 190.47299 samples/s, eta: 5 days, 11:19:10
[2022/07/21 06:28:58] ppocr INFO: epoch: [75/500], global_step: 100220, lr: 0.000954, acc: 0.351562, norm_edit_dis: 0.937217, loss: 13.312311, avg_reader_cost: 0.00068 s, avg_batch_cost: 0.55925 s, avg_samples: 128.0, ips: 228.87754 samples/s, eta: 5 days, 11:18:47
[2022/07/21 06:29:04] ppocr INFO: epoch: [75/500], global_step: 100230, lr: 0.000954, acc: 0.343750, norm_edit_dis: 0.939586, loss: 13.312311, avg_reader_cost: 0.00061 s, avg_batch_cost: 0.59037 s, avg_samples: 128.0, ips: 216.81468 samples/s, eta: 5 days, 11:18:25
[2022/07/21 06:29:10] ppocr INFO: epoch: [75/500], global_step: 100240, lr: 0.000954, acc: 0.339844, norm_edit_dis: 0.939396, loss: 13.581049, avg_reader_cost: 0.07561 s, avg_batch_cost: 0.65824 s, avg_samples: 128.0, ips: 194.45798 samples/s, eta: 5 days, 11:18:08
[2022/07/21 06:29:19] ppocr INFO: epoch: [75/500], global_step: 100250, lr: 0.000954, acc: 0.355469, norm_edit_dis: 0.938832, loss: 13.796519, avg_reader_cost: 0.27817 s, avg_batch_cost: 0.86297 s, avg_samples: 128.0, ips: 148.32440 samples/s, eta: 5 days, 11:18:02
[2022/07/21 06:29:25] ppocr INFO: epoch: [75/500], global_step: 100260, lr: 0.000954, acc: 0.351562, norm_edit_dis: 0.940922, loss: 12.944593, avg_reader_cost: 0.00222 s, avg_batch_cost: 0.56770 s, avg_samples: 128.0, ips: 225.47009 samples/s, eta: 5 days, 11:17:40
[2022/07/21 06:29:37] ppocr INFO: epoch: [75/500], global_step: 100270, lr: 0.000954, acc: 0.339844, norm_edit_dis: 0.936109, loss: 13.633232, avg_reader_cost: 0.63327 s, avg_batch_cost: 1.27472 s, avg_samples: 128.0, ips: 100.41385 samples/s, eta: 5 days, 11:17:58
[2022/07/21 06:29:48] ppocr INFO: epoch: [75/500], global_step: 100280, lr: 0.000954, acc: 0.332031, norm_edit_dis: 0.933838, loss: 14.711888, avg_reader_cost: 0.51032 s, avg_batch_cost: 1.09601 s, avg_samples: 128.0, ips: 116.78701 samples/s, eta: 5 days, 11:18:05
[2022/07/21 06:29:54] ppocr INFO: epoch: [75/500], global_step: 100290, lr: 0.000954, acc: 0.339844, norm_edit_dis: 0.933997, loss: 14.596161, avg_reader_cost: 0.00083 s, avg_batch_cost: 0.59570 s, avg_samples: 128.0, ips: 214.87346 samples/s, eta: 5 days, 11:17:44
[2022/07/21 06:30:06] ppocr INFO: epoch: [75/500], global_step: 100300, lr: 0.000954, acc: 0.339844, norm_edit_dis: 0.937798, loss: 13.345118, avg_reader_cost: 0.53199 s, avg_batch_cost: 1.18358 s, avg_samples: 128.0, ips: 108.14639 samples/s, eta: 5 days, 11:17:57
[2022/07/21 06:30:12] ppocr INFO: epoch: [75/500], global_step: 100310, lr: 0.000954, acc: 0.343750, norm_edit_dis: 0.941131, loss: 12.990923, avg_reader_cost: 0.00068 s, avg_batch_cost: 0.57355 s, avg_samples: 128.0, ips: 223.17010 samples/s, eta: 5 days, 11:17:35
[2022/07/21 06:30:22] ppocr INFO: epoch: [75/500], global_step: 100320, lr: 0.000954, acc: 0.332031, norm_edit_dis: 0.934477, loss: 13.830566, avg_reader_cost: 0.46184 s, avg_batch_cost: 1.04353 s, avg_samples: 128.0, ips: 122.66095 samples/s, eta: 5 days, 11:17:39
[2022/07/21 06:30:32] ppocr INFO: epoch: [75/500], global_step: 100330, lr: 0.000954, acc: 0.332031, norm_edit_dis: 0.937619, loss: 13.486568, avg_reader_cost: 0.35058 s, avg_batch_cost: 0.92052 s, avg_samples: 128.0, ips: 139.05131 samples/s, eta: 5 days, 11:17:37
[2022/07/21 06:30:37] ppocr INFO: epoch: [75/500], global_step: 100340, lr: 0.000954, acc: 0.375000, norm_edit_dis: 0.939735, loss: 11.924157, avg_reader_cost: 0.00058 s, avg_batch_cost: 0.58131 s, avg_samples: 128.0, ips: 220.19261 samples/s, eta: 5 days, 11:17:15
[2022/07/21 06:30:47] ppocr INFO: epoch: [75/500], global_step: 100350, lr: 0.000954, acc: 0.367187, norm_edit_dis: 0.937912, loss: 12.060099, avg_reader_cost: 0.32633 s, avg_batch_cost: 0.91873 s, avg_samples: 128.0, ips: 139.32223 samples/s, eta: 5 days, 11:17:12
[2022/07/21 06:30:58] ppocr INFO: epoch: [75/500], global_step: 100360, lr: 0.000954, acc: 0.347656, norm_edit_dis: 0.936123, loss: 13.900740, avg_reader_cost: 0.38886 s, avg_batch_cost: 1.11041 s, avg_samples: 128.0, ips: 115.27266 samples/s, eta: 5 days, 11:17:21
[2022/07/21 06:31:04] ppocr INFO: epoch: [75/500], global_step: 100370, lr: 0.000954, acc: 0.347656, norm_edit_dis: 0.940024, loss: 12.774862, avg_reader_cost: 0.00087 s, avg_batch_cost: 0.58649 s, avg_samples: 128.0, ips: 218.24852 samples/s, eta: 5 days, 11:16:59
.......
此时的验证集精度acc: 0.572419999885516, norm_edit_dis: 0.9788205213278756,然而训练集准确率0.3,字符编辑距离0.93,问题1:验证集精度高于实时训练精度是否正常?还是训练远远未达到收敛?问题2:RecResizeImg:image_shape: [3, 32, 640] ,图像高度是否不能设置为640,还是只能设置到320?问题3:max_text_length:150,文本长度过长,是否会影响识别性能?
b09cbbtk3#
建议减少max_text_length,精度会提高。具体多少的还是试吧,和炼丹一样。我试过中文长度为12效果最好,英文的没试过。