DBNet在处理形变, 艺术字体, 中文, 英文, 数字, 以及横排竖排的文字都比较强.
要训练这么一个中文文本检测器, 首先需要中文的数据集. 这里我比较推荐两个, 一个是ICPR的数据集, 另外一个就是ICDAR的中文worldwide检测.
考虑到我们想做的事印刷字体的OCR, 对于真实环境下的场景我们不太考究, 上面的效果就是在ICPR的数据集上训练的.
实际上, 对于数据集的标注格式, 也很好理解:
['61.95,195.68,58.95,229.11,235.53,226.11,234.53,196.68,Jagermeifter', '454.58,421.37,450.58,451.0,601.63,449.0,602.63,418.37,Jagermeifter', '63.11,366.16,65.16,428.74,270.21,425.79,289.26,366.16,买就送!', '482.84,524.53,482.84,534.47,569.16,532.47,571.16,521.53,JagermeifterAG', '512.63,504.42,512.63,513.95,538.32,514.95,537.32,506.42,mast', '482.47,484.21,480.47,489.68,567.63,490.68,567.63,485.21,###', '594.32,473.84,595.32,619.95,600.63,618.95,599.63,472.84,###', '482.16,616.84,483.84,625.68,567.21,632.16,565.26,621.89,HERBLIQUEUR', '363.32,574.63,365.16,590.63,441.16,589.21,445.0,574.63,GLENFRANT', '371.42,590.74,372.42,599.16,434.89,600.16,434.89,593.42,SINGLEMALT', '389.95,603.95,391.95,609.84,419.42,609.84,419.42,603.95,CHWH', '374.26,603.37,372.26,609.84,386.32,609.84,387.32,604.37,###', '372.26,615.32,374.26,621.47,436.58,621.05,434.32,615.32,TheMajorsHesenre', '380.16,628.63,379.16,632.0,426.74,633.0,426.74,628.63,###', '386.05,634.26,386.05,637.63,422.95,637.63,421.95,634.26,###', '392.63,640.42,392.63,643.37,415.63,643.37,416.63,641.42,###', '396.42,644.63,396.42,648.42,410.16,648.42,411.16,645.63,###', '362.74,787.79,362.74,796.89,443.32,799.89,444.32,787.79,###', '370.42,777.95,373.42,791.05,397.16,788.05,394.16,780.95,###', '402.47,781.89,402.47,787.79,433.21,788.79,432.21,782.89,###', '371.16,759.58,370.16,767.74,430.68,767.74,428.68,759.58,###', '379.16,749.47,380.16,753.26,426.89,754.26,426.89,750.47,###', '384.79,745.42,383.79,748.37,422.68,748.37,420.68,745.42,###', '392.37,740.21,391.37,744.16,413.53,744.16,413.53,741.21,###', '394.89,735.16,392.89,739.95,408.63,738.95,409.63,735.16,###', '496.79,256.0,495.79,264.84,523.32,258.11,518.37,253.21,###', '504.05,586.84,507.89,594.37,548.74,598.05,544.74,585.53,###', '558.11,571.26,557.11,583.42,574.53,585.42,573.53,570.26,35%', '562.74,589.32,561.74,600.42,574.26,600.42,575.26,592.32,VDl', '472.11,581.74,469.11,598.21,495.68,595.21,493.68,584.74,700ml', '474.47,568.16,471.47,581.79,492.58,581.79,492.74,566.16,70d', '509.53,555.79,510.53,562.11,539.58,563.11,540.58,556.79,###', '484.95,542.21,488.89,552.32,564.32,554.84,563.89,546.42,WOLFENBGTTEL', '452.16,294.47,452.16,397.21,459.16,397.21,458.16,294.47,###', '594.32,294.47,595.32,389.21,600.89,388.21,600.89,294.47,###', '588.16,272.16,595.32,288.16,600.47,286.47,592.89,265.26,###', '555.42,252.37,554.74,259.95,584.37,270.05,585.74,264.16,###', '525.53,250.95,529.63,259.95,552.21,259.95,554.47,251.37,###', '460.26,290.11,456.79,284.21,488.47,259.79,493.84,268.21,###', '673.68,313.84,667.68,605.17,707.68,603.17,708.02,319.17,Jagermeifter']
一张图片的标注就像这样, 我把文本解析成了列表. 每一行就是这一行字的四个点坐标.
DBNet训练以及demo代码都可以在下面链接获取:
http://manaai.cn/aicodes_detail3.html?id=65
DBNet的模型结构如图所示:
开始训练:
./single_gpu_train.sh
将ICPR的数据集软链接到 datasets
下面, 不用更改任何配置即可开始训练.
执行:
python3 demo.py --model_path output/DBNet_resnet18_FPN_DBHead/checkpoint/model_best.pth --data ./imgs/
既可实现预测:
版权说明 : 本文为转载文章, 版权归原作者所有 版权申明
原文链接 : https://blog.csdn.net/kwame211/article/details/121385378
内容来源于网络,如有侵权,请联系作者删除!