如何清理文本文件以像JSON一样导出- Python

wr98u20j  于 2022-12-01  发布在  Python
关注(0)|答案(2)|浏览(135)

我有以下来自LFT命令的文本文件。

2  [14080] [100.0.0.0 - 100.255.255.255] 100.5.254.150 6.3ms
3  [14080] [100.0.0.0 - 100.255.255.255] 100.8.254.149 5.7ms
4  [15169] [GOOGLE] 142.250.164.139 17.5ms
5  [15169] [GOOGLE] 142.250.164.138 10.9ms
6  [15169] [GOOGLE] 72.14.233.63 12.8ms
7  [15169] [GOOGLE] 142.250.210.131 9.6ms
8  [15169] [GOOGLE]  142.250.78.78 11.9ms

其中每个空格可以理解为一个字段。我尝试将此文本文件转换为JSON文件,但我有:

{
    "emp1": {
        "Jumps": "2",
        "System": "[14080]",
        "Adress": "[100.0.0.0",
        "IP": "-",
        "Delay": "100.255.255.255] 100.5.254.150 6.3ms"
    },
    "emp2": {
        "Jumps": "3",
        "System": "[14080]",
        "Adress": "[100.0.0.0",
        "IP": "-",
        "Delay": "100.255.255.255] 100.5.254.150 5.7ms"
    },
    "emp3": {
        "Jumps": "4",
        "System": "[15169]",
        "Adress": "[GOOGLE]",
        "IP": "142.250.164.139",
        "Delay": "17.5ms"
    },
    "emp4": {
        "Jumps": "5",
        "System": "[15169]",
        "Adress": "[GOOGLE]",
        "IP": "142.250.164.138",
        "Delay": "10.9ms"
    },
    "emp5": {
        "Jumps": "6",
        "System": "[15169]",
        "Adress": "[GOOGLE]",
        "IP": "72.14.233.63",
        "Delay": "12.8ms"
    },
    "emp6": {
        "Jumps": "7",
        "System": "[15169]",
        "Adress": "[GOOGLE]",
        "IP": "142.250.210.131",
        "Delay": "9.6ms"
    },
    "emp7": {
        "Jumps": "8",
        "System": "[15169]",
        "Adress": "[GOOGLE]",
        "IP": "142.250.78.78",
        "Delay": "11.9ms"
    }
}

如您所见,“延迟”部分的前两个字段是错误的。
我怎么修?我能做什么?
我也试着用Pandas,但我得到的是同样的答案:
data = pd.read_csv("file.txt", sep=r'\s+')

o2gm4chl

o2gm4chl1#

您可以尝试使用re模块解析文本:

text = """\
2  [14080] [100.0.0.0 - 100.255.255.255] 100.5.254.150 6.3ms
3  [14080] [100.0.0.0 - 100.255.255.255] 100.8.254.149 5.7ms
4  [15169] [GOOGLE] 142.250.164.139 17.5ms
5  [15169] [GOOGLE] 142.250.164.138 10.9ms
6  [15169] [GOOGLE] 72.14.233.63 12.8ms
7  [15169] [GOOGLE] 142.250.210.131 9.6ms
8  [15169] [GOOGLE]  142.250.78.78 11.9ms"""

import re

pat = re.compile(r"(?m)^\s*(\d+)\s*\[(.*?)\]\s*\[(.*?)\]\s*(\S+)\s*(\S+)")

out = {}
for i, t in enumerate(pat.findall(text), 1):
    out[f"emp{i}"] = {
        "Jumps": t[0],
        "System": t[1],
        "Adress": t[2],
        "IP": t[3],
        "Delay": t[4],
    }

print(out)

印刷品:

{
    "emp1": {
        "Jumps": "2",
        "System": "14080",
        "Adress": "100.0.0.0 - 100.255.255.255",
        "IP": "100.5.254.150",
        "Delay": "6.3ms",
    },
    "emp2": {
        "Jumps": "3",
        "System": "14080",
        "Adress": "100.0.0.0 - 100.255.255.255",
        "IP": "100.8.254.149",
        "Delay": "5.7ms",
    },
    "emp3": {
        "Jumps": "4",
        "System": "15169",
        "Adress": "GOOGLE",
        "IP": "142.250.164.139",
        "Delay": "17.5ms",
    },
    "emp4": {
        "Jumps": "5",
        "System": "15169",
        "Adress": "GOOGLE",
        "IP": "142.250.164.138",
        "Delay": "10.9ms",
    },
    "emp5": {
        "Jumps": "6",
        "System": "15169",
        "Adress": "GOOGLE",
        "IP": "72.14.233.63",
        "Delay": "12.8ms",
    },
    "emp6": {
        "Jumps": "7",
        "System": "15169",
        "Adress": "GOOGLE",
        "IP": "142.250.210.131",
        "Delay": "9.6ms",
    },
    "emp7": {
        "Jumps": "8",
        "System": "15169",
        "Adress": "GOOGLE",
        "IP": "142.250.78.78",
        "Delay": "11.9ms",
    },
}
mzmfm0qo

mzmfm0qo2#

安德烈的回答已经天衣无缝,只是本想再加一个方案:

with open("textfile.txt", 'r') as f:
s = f.readlines()

data = {}
for i, value in enumerate(s, 1):
    t = value.split('\n')[0].split()
    data[f"emp{i}"] = {
        "Jumps": t[0],
        "System": t[1],
        "Adress": t[2] if len(t)==5 else ''.join(t[2:5]),
        "IP": t[-2],
        "Delay": t[-1]}

这将打印:

{
 'emp1':{ 
     'Jumps': '2',
     'System': '[14080]',
     'Adress': '[100.0.0.0-100.255.255.255]',
     'IP': '100.5.254.150', 'Delay': '6.3ms'},
 'emp2': {
     'Jumps': '3',
     'System': '[14080]',
     'Adress': '[100.0.0.0-100.255.255.255]',
     'IP': '100.8.254.149',
     'Delay': '5.7ms'},
 'emp3': {
     'Jumps': '4',
     'System': '[15169]',
     'Adress': '[GOOGLE]',
     'IP': '142.250.164.139',
     'Delay': '17.5ms'},
 'emp4': {
     'Jumps': '5',
     'System': '[15169]',
     'Adress': '[GOOGLE]',
     'IP': '142.250.164.138',
     'Delay': '10.9ms'},
 'emp5': {
     'Jumps': '6',
     'System': '[15169]',
     'Adress': '[GOOGLE]',
     'IP': '72.14.233.63',
     'Delay': '12.8ms'},
 'emp6': {
     'Jumps': '7',
     'System': '[15169]',
     'Adress': '[GOOGLE]',
     'IP': '142.250.210.131',
     'Delay': '9.6ms'},
 'emp7': {
     'Jumps': '8',
     'System': '[15169]',
     'Adress': '[GOOGLE]',
     'IP': '142.250.78.78',
     'Delay': '11.9ms'}
}

相关问题