我尝试使用xmltodict解析XML,希望最终转换为其他人更可读的表格格式。我已经能够通过大部分XML,但当我遇到一个具有多个子元素的元素时,我觉得我在追自己的尾巴。我希望使用panda和我从XML中提取的值...
下面是我试图解析的XML的一个净化版本:
<batchConfiguration>
<batchJob name="BATCHJOB1">
<className>batchJob1</className>
<schedule>Y</schedule>
<interval>300</interval>
<systemControlled>N</systemControlled>
</batchJob>
<batchJob name="BATCHJOB2">
<params>
<param name="QueueName1">batchQueue1</param>
</params>
<className>batchJob2</className>
<startTime>02:10:00</startTime>
<schedule>N</schedule>
<daysOfTheWeek>YYYYYYY</daysOfTheWeek>
<systemControlled>N</systemControlled>
</batchJob>
<batchJob name="BATCHJOB3">
<params>
<param name="ignoreErrors">Y</param>
<param name="batchSize">1000</param>
</params>
<className>classyBatchJob</className>
<schedule>Y</schedule>
<interval>90</interval>
<systemControlled>N</systemControlled>
</batchJob>
</batchConfiguration>
我的想法是我可以在有多个“params”的行中循环。我可以返回单行“params”,但当有多个时就难住了。这是我到目前为止的代码。它有几个部分,我试图在我走的时候弄清楚事情。XML是从一个文件中读取的...
import xmltodict as xml
import pprint
#File to parse
fileptr=open(r"FileIRead.xml")
# Show raw XML text file data
raw_file= fileptr.read()
# print(raw_file)
# Create an XML dictionary
xml_dict=xml.parse(raw_file)
pprint.pprint(xml_dict)
xml_dict1=xml.parse(raw_file)['batchConfiguration']['batchJob']
pprint.pprint(xml_dict1)
# pprint.pprint(xml_dict['batchConfiguration']['batchJob'])
# https://docs.python.org/3/tutorial/errors.html
for bJ in xml_dict1:
bJName=bJ['@name']
print(f"Name: {bJ['@name']}")
print(bJName)
try:
print(f"Interval: {bJ['interval']}")
except:
print("Interval: N/A")
try:
print(f"Scheduled: {bJ['schedule']}")
except:
print("N/A")
try:
print(f"Start Time: {bJ['startTime']}")
except:
print("Start Time: N/A")
try:
print(f"End Time: {bJ['endTime']}")
except:
print("End Time: N/A")
try:
# This works fine to return only a single element. With multiple it fails.
print(f"Params: {bJ['params']['param']['@name']} - {bJ['params']['param']['#text']}")
except:
print("Params: N/A")
try:
print(f"Classname: {bJ['className']}")
except:
print("Classname: N/A")
try:
print(f"DaysOfWeek: {bJ['daysOfTheWeek']}")
except:
print("DaysOfWee: N/A")
try:
# Attempt to get all parameters single or multiple
xml_dict2=xml.parse(raw_file)['params']['param']
pprint.pprint(xml_dict2)
for bJ1 in xml_dict2['params']['param']:
print(f"--- {bJ1['@name']}")
except:
print("It no worky")
编辑:应请求...我已经能够得到的输出是:
Name: BATCHJOB1
Classname: batchJob1
... (etc)
我的最终目标是获取输出并将其转换为列格式,如下所示:
Name Classname ...
BATCHJOB1 batchJob1
“N/A”将放在该要素不存在或没有价值的地方。
2条答案
按热度按时间eeq64g8w1#
xmltodict
仅在它是一个参数时返回dict,而在它是两个或更多参数时返回列表。.parse
有一个force_list
参数,允许指示应始终为列表的键。您可以使用:
然后:
lf5gs5x22#
如果我没有理解错的话,这可以通过using pandas.read_xml()来实现:
基于示例xml的输出: