一个分支上可能有很多相同的标签,如何将它们全部保存到数据框中?
我试了下一段代码,但是重复的标签,比如RowData被替换成了futher数据。我的目标是保存完整的数据。
import pandas as pd
from xml.etree import ElementTree
path=str('data.xml')
with open(path, mode="r", encoding="utf-8") as f:
xml_file = f.read()
items_delete=['<ObjectRelation>','</ObjectRelation>','<List>','</List>','<RowData>','</RowData>','<Kind>','</Kind>']
for item in items_delete:
xml_file=xml_file.replace(item, '')
df = pd.read_xml(xml_file)
enter image description here
初始数据示例:
<ItemList>
<ItemData>
<ObjectRelation>
<ObjectCadastreNr>01000180062</ObjectCadastreNr>
<ObjectType>PARCEL</ObjectType>
</ObjectRelation>
<List>
<RowData>
<Kind>
<KindId>7312050201</KindId>
<KindName>ekspluatācijas aizsargjoslas teritorija gar elektrisko tīklu kabeļu līniju</KindName>
</Kind>
<Nr>1</Nr>
<EstablishDate>1997-02-24</EstablishDate>
<Area>0.0127</Area>
<Measure>ha</Measure>
</RowData>
<RowData>
<Kind>
<KindId>7312040200</KindId>
<KindName>ekspluatācijas aizsargjoslas teritorija gar elektronisko sakaru tīklu gaisvadu līniju</KindName>
</Kind>
<Nr>3</Nr>
<EstablishDate>1996-01-13</EstablishDate>
</RowData>
</List>
</ItemData>
<ItemData>
<ObjectRelation>
<ObjectCadastreNr>01000180062</ObjectCadastreNr>
<ObjectType>PARCEL</ObjectType>
</ObjectRelation>
<List>
<RowData>
<Kind>
<KindId>7312060100</KindId>
<KindName>ekspluatācijas aizsargjoslas teritorija gar pazemes siltumvadu, siltumapgādes iekārtu un būvi</KindName>
</Kind>
<Nr>5</Nr>
<EstablishDate>1997-01-13</EstablishDate>
</RowData>
</List>
</ItemData>
<ItemList>
2条答案
按热度按时间hc2pp10m1#
您可以尝试使用
beautifulsoup
解析文档:图纸:
q0qdq0h22#
你可以找到元素然后删除它。2这是XML,所以在删除子元素之前需要找到父元素。3下面是如何工作的想法。4在代码中添加注解。5希望这能有所帮助!