PaddleOCR 表格识别训练数据

ep6jt1vc  于 2022-10-27  发布在  其他
关注(0)|答案(5)|浏览(496)

表格识别训练数据格式是怎样的,哪些是必要字段,哪些不是必要字段,每个字段的含义是什么

kcugc4gi

kcugc4gi1#

好未来数据样例如下,如果自己构造数据, structure字段是必须的吗
{"filename": "1621871735627846536945182109696_0.jpg",
"html": {"cells": [{"bbox": [[10, 6], [125, 7], [128, 33], [9, 33]], "tokens": "柳树"}, {"bbox": [[125, 7], [237, 6], [242, 32], [128, 33]], "tokens": "松树"}, {"bbox": [[237, 7], [347, 5], [357, 30], [243, 32]], "tokens": "杨树"}, {"bbox": [[10, 33], [128, 33], [131, 54], [9, 53]], "tokens": "1300棵"}, {"bbox": [[128, 34], [243, 32], [250, 52], [131, 53]], "tokens": "700棵"}, {"bbox": [[243, 32], [357, 30], [367, 50], [250, 52]], "tokens": "800棵"}],
"structure": {"tokens": ["", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""]}, "gt": "

| 柳树 | 松树 | 杨树 |
| 1300棵 | 700棵 | 800棵 |
"}, "image_id": 3358, "split": "train"}`

laik7k3q

laik7k3q2#

structure 是必须的

k97glaaz

k97glaaz4#

你好,可以参考这个文档看下: https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.5/ppstructure/table/README_ch.md

表格标注好的数据,只有坐标,如何转成structure 中html格式

相关问题