csv 如何在python中将.txt转换为.xml

yiytaume  于 2023-01-28  发布在  Python
关注(0)|答案(1)|浏览(144)

所以我现在面临的问题是把一个文本文件转换成xml文件,这个文本文件应该是这种格式。

Serial Number:      Operator ID:  test  Time:  00:03:47 Test Step 2      TP1:  17.25    TP2:  2.46
Serial Number:      Operator ID:  test  Time:  00:03:47 Test Step 2      TP1:  17.25    TP2:  2.46

我想把它转换成一个xml格式:

<?xml version="1.0" encoding="utf-8"?>
<root>
 <filedata>
 </serialnumber>
 <operatorid>test</operatorid>
 <time>00:00:42 Test Step 2</time>
 <tp1>17.25</tp1>
 <tp2>2.46</tp2>
 </filedata>
...
</root>

我使用类似这样的代码将我以前的文本文件转换为xml...但是现在我面临着拆分行的问题。

import xml.etree.ElementTree as ET
import fileinput
import os
import itertools as it

root = ET.Element('root')
with open('text.txt') as f:
    lines = f.read().splitlines()
celldata = ET.SubElement(root, 'filedata')
for line in it.groupby(lines):
    line=line[0]
    if not line:
        celldata = ET.SubElement(root, 'filedata')
    else:
        tag = line.split(":")
        el=ET.SubElement(celldata,tag[0].replace(" ",""))
        tag=' '.join(tag[1:]).strip()
        if 'File Name' in line:
            tag = line.split("\\")[-1].strip()
        elif 'File Size' in line:
            splist =  filter(None,line.split(" "))
            tag = splist[splist.index('Low:')+1]
            #splist[splist.index('High:')+1]
        el.text = tag
import xml.dom.minidom as minidom
formatedXML = minidom.parseString(
                          ET.tostring(
                                      root)).toprettyxml(indent=" ",encoding='utf-8').strip()

with open("test.xml","wb") as f:
    f.write(formatedXML)

我在stackoverflow“Python text file to xml“中看到过类似的问题,但问题是我无法将其转换为.csv格式,因为此文件是由某台机器生成的。如果有人知道如何解决此问题,请帮助。谢谢。

p8ekf7hl

p8ekf7hl1#

这里有一个更好的分割线的方法。
请注意,text变量在技术上就是您的.txt文件,我特意修改了它,以便我们有更大的输出上下文。

from collections import OrderedDict
from pprint import pprint

# Text would be our loaded .txt file.
text = """Serial Number:  test    Operator ID:  test1  Time:  00:03:47 Test Step 1      TP1:  17.25    TP2:  2.46
Serial Number:      Operator ID:  test2  Time:  00:03:48 Test Step 2      TP1:  17.24    TP2:  2.47"""

# Headers of the intended break-points in the text files.
headers = ["Serial Number:", "Operator ID:", "Time:", "TP1:", "TP2:"]

information = []

# Split our text by lines.
for line in text.split("\n"):

    # Split our text up so we only have the information per header.
    default_header = headers[0]
    for header in headers[1:]:
        line = line.replace(header, default_header)
    info = [i.strip() for i in line.split(default_header)][1:]

    # Compile our header+information together into OrderedDict's.
    compiled_information = OrderedDict()
    for header, info in zip(headers, info):
        compiled_information[header] = info

    # Append to our overall information list.
    information.append(compiled_information)

# Pretty print the information (not needed, only for better display of data.)
pprint(information)

输出:

[OrderedDict([('Serial Number:', 'test'),
              ('Operator ID:', 'test1'),
              ('Time:', '00:03:47 Test Step 1'),
              ('TP1:', '17.25'),
              ('TP2:', '2.46')]),
 OrderedDict([('Serial Number:', ''),
              ('Operator ID:', 'test2'),
              ('Time:', '00:03:48 Test Step 2'),
              ('TP1:', '17.24'),
              ('TP2:', '2.47')])]

这个方法应该比你现在写的更好,代码的思想是我从另一个项目中保存下来的,我建议你仔细阅读代码并 * 理解 * 它的逻辑。
从这里你应该能够循环通过information列表并创建你的自定义.xml文件,我建议你也检查一下dicttoxml,因为它可能会让你的最后一步更容易。
关于您的代码,请记住:分解基本任务比试图将它们合并成一个任务要容易得多。通过在分解txt文件的同时创建xml文件,你已经创建了一个怪物,当它带着bug卷土重来时,很难对付它。相反,一次只走一步--创建你100%确定要完成的"检查点",然后继续下一个任务。

相关问题