表格识别训练数据格式是怎样的,哪些是必要字段,哪些不是必要字段,每个字段的含义是什么
kcugc4gi1#
好未来数据样例如下,如果自己构造数据, structure字段是必须的吗{"filename": "1621871735627846536945182109696_0.jpg","html": {"cells": [{"bbox": [[10, 6], [125, 7], [128, 33], [9, 33]], "tokens": "柳树"}, {"bbox": [[125, 7], [237, 6], [242, 32], [128, 33]], "tokens": "松树"}, {"bbox": [[237, 7], [347, 5], [357, 30], [243, 32]], "tokens": "杨树"}, {"bbox": [[10, 33], [128, 33], [131, 54], [9, 53]], "tokens": "1300棵"}, {"bbox": [[128, 34], [243, 32], [250, 52], [131, 53]], "tokens": "700棵"}, {"bbox": [[243, 32], [357, 30], [367, 50], [250, 52]], "tokens": "800棵"}],"structure": {"tokens": ["", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""]}, "gt": "
{"filename":
| 柳树 | 松树 | 杨树 || 1300棵 | 700棵 | 800棵 |"}, "image_id": 3358, "split": "train"}`
laik7k3q2#
structure 是必须的
kninwzqo3#
你好,可以参考这个文档看下: https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.5/ppstructure/table/README_ch.md
k97glaaz4#
表格标注好的数据,只有坐标,如何转成structure 中html格式
ha5z0ras5#
可以使用这个文件 https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.6/ppstructure/table/convert_label2html.py
5条答案
按热度按时间kcugc4gi1#
好未来数据样例如下,如果自己构造数据, structure字段是必须的吗
{"filename":
"1621871735627846536945182109696_0.jpg","html": {"cells": [{"bbox": [[10, 6], [125, 7], [128, 33], [9, 33]], "tokens": "柳树"}, {"bbox": [[125, 7], [237, 6], [242, 32], [128, 33]], "tokens": "松树"}, {"bbox": [[237, 7], [347, 5], [357, 30], [243, 32]], "tokens": "杨树"}, {"bbox": [[10, 33], [128, 33], [131, 54], [9, 53]], "tokens": "1300棵"}, {"bbox": [[128, 34], [243, 32], [250, 52], [131, 53]], "tokens": "700棵"}, {"bbox": [[243, 32], [357, 30], [367, 50], [250, 52]], "tokens": "800棵"}],
"structure": {"tokens": ["", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""]}, "gt": "
| 柳树 | 松树 | 杨树 |
| 1300棵 | 700棵 | 800棵 |
"}, "image_id": 3358, "split": "train"}`
laik7k3q2#
structure 是必须的
kninwzqo3#
你好,可以参考这个文档看下: https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.5/ppstructure/table/README_ch.md
k97glaaz4#
你好,可以参考这个文档看下: https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.5/ppstructure/table/README_ch.md
表格标注好的数据,只有坐标,如何转成structure 中html格式
ha5z0ras5#
可以使用这个文件 https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.6/ppstructure/table/convert_label2html.py