在python中阅读PASCAL VOC注解

vyswwuz2 于 2023-01-22 发布在 Python

关注(0)|答案(3)|浏览(118)

我在xml文件中有注解，比如下面这个，它遵循PASCAL VOC约定：

<annotation>
<folder>training</folder>
<filename>chanel1.jpg</filename>
<source>
<database>synthetic initialization</database>
<annotation>PASCAL VOC2007</annotation>
<image>synthetic</image>
<flickrid>none</flickrid>
</source>
<owner>
<flickrid>none</flickrid>
<name>none</name>
</owner>
<size>
<width>640</width>
<height>427</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>chanel</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>344</xmin>
<ymin>10</ymin>
<xmax>422</xmax>
<ymax>83</ymax>
</bndbox>
</object>
<object>
<name>chanel</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>355</xmin>
<ymin>165</ymin>
<xmax>443</xmax>
<ymax>206</ymax>
</bndbox>
</object>
</annotation>

例如，在Python中检索字段filename和bndbox的最简洁的方法是什么？
我尝试使用ElementTree，这似乎是Python的官方解决方案，但我无法让它工作。
我的代码到目前为止：

from xml.etree import ElementTree as ET
tree = ET.parse("data/all/annotations/" + file)
fn = tree.find('filename').text
boxes = tree.findall('bndbox')

这就产生了

fn == 'chanel1.jpg'
boxes == []

因此，它成功地提取了filename字段，但没有提取bndbox字段。

python-3.x

来源：https://stackoverflow.com/questions/53317592/reading-pascal-voc-annotations-in-python

3条答案

按热度按时间

voj3qocg1#

对于你的问题，这是一个非常简单的解决方案：
这将返回嵌套列表中的框坐标[xmin，ymin，xmax，ymax]和文件名有一次我与bndbox标签斗争，其中混合了（ymin，xmin，...）或任何其他奇怪的组合，所以这段代码读取标签不仅是位置。
最后我更新了代码。感谢craq和Pritesh Gohil，你是完全正确的。
希望能有所帮助...

import xml.etree.ElementTree as ET

def read_content(xml_file: str):

    tree = ET.parse(xml_file)
    root = tree.getroot()

    list_with_all_boxes = []

    for boxes in root.iter('object'):

        filename = root.find('filename').text

        ymin, xmin, ymax, xmax = None, None, None, None

        ymin = int(boxes.find("bndbox/ymin").text)
        xmin = int(boxes.find("bndbox/xmin").text)
        ymax = int(boxes.find("bndbox/ymax").text)
        xmax = int(boxes.find("bndbox/xmax").text)

        list_with_single_boxes = [xmin, ymin, xmax, ymax]
        list_with_all_boxes.append(list_with_single_boxes)

    return filename, list_with_all_boxes

name, boxes = read_content("file.xml")

赞(0）回复(0）举报 2023-01-22

d5vmydt92#

另一种选择是使用标准的xmldict库加载python指令中的VOC XML。

import xmltodict

with open('/path/to/voc.xml') as file:
        file_data = file.read()
        dict_data = xmltodict.parse(file_data)
        print(dict_data)

赞(0）回复(0）举报 2023-01-22

oknwwptz3#

我的尝试，比公认的答案更具可读性，提供了转换为基于0的像素坐标的选项，并将对象的名称而不是文件的名称与每个框的坐标配对。
输出示例：

{'excavator': {'xmin': 0, 'ymin': 0, 'xmax': 1265, 'ymax': 587},
 'dump_truck': {'xmin': 259, 'ymin': 159, 'xmax': 713, 'ymax': 405}}

import xml.etree.ElementTree as ET

def read_Pascal_VOC(xml_file,do_0_based):
    # Pascal VOC is 1-based, but more recent formats like MS COCO are 0-based
    # see, e.g., https://github.com/Ricardozzf/maskrcnn-benchmark/commit/da8f99927eb945d3e66985d5e070fb55db472de6
    if do_0_based:
        to_subtract = 1
    else:
        to_subtract = 0

    tree = ET.parse(xml_file)
    root = tree.getroot()

    boxes = dict()

    for box in root.iter('object'):

        name = box.find('name').text
        
        bb = box.find('bndbox')
        
        # dict to remove any ambiguity ordering-wise
        coords = dict(xmin = bb.find('xmin').text, 
                      ymin = bb.find('ymin').text, 
                      xmax = bb.find('xmax').text, 
                      ymax = bb.find('ymax').text)
            
        coords = {k:int(v)-to_subtract for k,v in coords.items()}
        
        if name in boxes:
            boxes[name] = boxes[name] + [coords]
        else:
            boxes[name] = [coords]

    return boxes

赞(0）回复(0）举报 2023-01-22

我来回答

在python中阅读PASCAL VOC注解

3条答案

相关问题

热门标签

最新问答