PaddleOCR PGNet training - Slow E2E Metric calculation when number of texts per image > 100

hc8w905p 于 2022-11-05 发布在其他

关注(0)|答案(2)|浏览(173)

My test image having size of 1200 x 1696, with number of texts per image could be maximum 500 (could be similar to one page inside a book)
I trained the PGNet, the training epoch is fine, but the eval is really slow.
I have investigated inside your code and the issue come from get_socre_A function in E2EMetric, so I guess when the size of e2e_info_list and gt_info_list is above 100, the process take really long time to complete (about 13s per page with 120 annotated text blocks, and 31s for 199 text blocks).
Maybe this function running on CPU. isn't it?
Do you have any solution for this one? Thank you.

paddleocr

来源：https://github.com/PaddlePaddle/PaddleOCR/issues/5164

2条答案

按热度按时间

8iwquhpp1#

My config file:

Global:
use_gpu: True
epoch_num: 600
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/pgnet_r50_vd_totaltext/
save_epoch_step: 10
eval_batch_step: [ 0, 1000 ]
cal_metric_during_train: False
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: False
infer_img:
valid_set: partvgg # two mode: totaltext valid curved words, partvgg valid non-curved words
save_res_path: ./output/pgnet_r50_vd_totaltext/predicts_pgnet.txt
character_dict_path: ppocr/utils/ko_2463.txt
character_type: korean
max_text_length: 25 # the max length in seq
max_text_nums: 1000 # the max seq nums in a pic
tcl_len: 64

Architecture:
model_type: e2e
algorithm: PGNet
Transform:
Backbone:
name: ResNet
layers: 50
Neck:
name: PGFPN
Head:
name: PGHead
out_channels: 2464 # Loss.pad_num + 1

Loss:
name: PGLoss
tcl_bs: 64
max_text_length: 25 # the same as Global: max_text_length
max_text_nums: 1000 # the same as Global竊쉖ax_text_nums
pad_num: 2463 # the length of dict for pad

Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.0001
regularizer:
name: 'L2'
factor: 0.

PostProcess:
name: PGPostProcess
score_thresh: 0.5
mode: fast # fast or slow two ways

Metric:
name: E2EMetric
mode: A # two ways for eval, A: label from txt, B: label from gt_mat
gt_mat_dir: ./Synthetic_ko_total_text/gt # the dir of gt_mat
character_dict_path: ppocr/utils/ko_2463.txt
main_indicator: f_score_e2e

Train:
dataset:
name: PGDataSet
data_dir: /home/gridone/TextRecognitionDataGenerator/out/document_v6/train
label_file_list: [/home/gridone/TextRecognitionDataGenerator/out/document_v6/train/train.txt]
ratio_list: [1.0]
transforms:

DecodeImage: # load image
img_mode: BGR
channel_first: False
E2ELabelEncodeTrain:
IaaAugment:
augmenter_args:
{ 'type': Affine, 'args': { 'rotate': [-90, 0, 90, 180], 'fit_output': True, 'cval': 255 } }
PGProcessTrain:
batch_size: 4 # same as loader: batch_size_per_card
min_crop_size: 24
min_text_size: 4
max_text_size: 512
KeepKeys:
keep_keys: [ 'images', 'tcl_maps', 'tcl_label_maps', 'border_maps','direction_maps', 'training_masks', 'label_list', 'pos_list', 'pos_mask' ] # dataloader will return list in this order
loader:
shuffle: True
drop_last: True
batch_size_per_card: 4
num_workers: 16

Eval:
dataset:
name: PGDataSet
data_dir: /home/gridone/TextRecognitionDataGenerator/out/document_v6/test
label_file_list: [/home/gridone/TextRecognitionDataGenerator/out/document_v6/test/test.txt]
transforms:

DecodeImage: # load image
img_mode: BGR
channel_first: False
E2ELabelEncodeTest:
E2EResizeForTest:
max_side_len: 768
NormalizeImage:
scale: 1./255.
mean: [ 0.485, 0.456, 0.406 ]
std: [ 0.229, 0.224, 0.225 ]
order: 'hwc'
ToCHWImage:
KeepKeys:
keep_keys: [ 'image', 'shape', 'polys', 'texts', 'ignore_tags', 'img_id']
loader:
shuffle: False
drop_last: False
batch_size_per_card: 1 # must be 1
num_workers: 16