使用Python将包含类似html元素的列表解析为嵌套json

ctrmrzij  于 2022-12-05  发布在  Python
关注(0)|答案(1)|浏览(135)

我并不擅长将列表的某些部分转换为嵌套的Json,希望能得到一些指导。我有一个包含如下数据的列表:

['<h5> 1|',
 '<h6>Type of Care|',
 '<h6>SA|Substance use treatment|',
 '<h6>DT|Detoxification |',
 '<h6>HH|Transitional housing, halfway house, or sober home|',
 '<h6>SUMH|Treatment for co-occurring serious mental health | illness/serious emotional disturbance and substance | use disorders|',
 '',
 '<h5> 2|',
 '<h6>Telemedicine|',
 '<h6>TELE|TelemedicineTelemedicine/telehealth|',
 '']

我想首先删除列表中没有内容的所有记录,然后我想将包含““这样的标记的记录转换为键,并将包含““的记录分组为如下json输出所示的值:

"codekey": [
                {
                    "category": [
                        {
                            "key": 1,
                            "value": "Type of Care"
                        }
                    ],
                    "codes": [
                        {
                            "key": "SA",
                            "value": "Substance use treatment"
                        },
                        {
                            "key": "DT",
                            "value": "Detoxification"
                        },
                        {
                            "key": "HH",
                            "value": "Transitional housing, halfway house, or sober home"
                        },
                        {
                            "key": "SUMH",
                            "value": "Treatment for co-occurring serious mental health | illness/serious emotional disturbance and substance | use disorders|"
                        }
                    ]
                },
                {
                    "category": [
                        {
                            "key": 2,
                            "value": "Telemedicine"
                        }
                    ],
                    "codes": [
                        {
                            "key": "TELE",
                            "value": "TelemedicineTelemedicine/telehealth"
                    
                        }
                    ]
                }
            ]

我想我需要执行一个循环,但我在如何创建“键/值”关系上遇到了麻烦。我想我还需要使用正则表达式,但我不是Python中最好的,无法从概念上将数据转换为所需的输出。有什么关于培训的建议可以让我做这件事吗?或者有什么关于如何开始的初步建议吗?谢谢!

ogq8wdun

ogq8wdun1#

考虑到您的格式保持不变。这里有一个灵活的解决方案,可配置:

class Separator():
    def __init__(self, data, title, sep, splitter):
        self.data = data # the data
        self.title = title # the starting in your case "<h5>"
        self.sep = sep # the point where you want to update res
        self.splitter = splitter # the separator between key | value
        self.res = [] # final res
        self.tempDict = {} # tempDict to append
    def clearString(self, string, *args):
        for arg in args:
            string = string.replace(arg, '') # replace every arg to ''
        return string.strip()
    def updateDict(self, val):
        if val == self.sep:
            self.res.append(self.tempDict) # update res
            self.tempDict = {} # renew tempDict to append
        else:
            try:
                if self.title in val: # check if it "<h5>" in your case
                    self.tempDict["category"] = [{"key": self.clearString(val, self.title, self.splitter), "value": self.clearString(self.data[self.data.index(val)+1],'<h6>', '|')}] # get the next value
                elif self.tempDict["category"][0]["value"] != self.clearString(val, '<h6>', '|'): # check if it is not the "value" of h6 in "category"
                    val = self.clearString(val,"<h6>").split("|")
                    if "codes" not in self.tempDict.keys(): self.tempDict["codes"] = [] # create key if not there
                    self.tempDict["codes"].append({"key": val[0], "value": val[1]})
            except: # avoid Exceptions
                pass
        return self.res
object = Separator(data, '<h5>', '', '|')
for val in data:
    res = object.updateDict(val)
print(res)

提供的示例输入的输出:

[
    {
        'category': [{'key': '1', 'value': 'Type of Care'}],
        'codes': [
            {'key': 'SA', 'value': 'Substance use treatment'},
            {'key': 'DT', 'value': 'Detoxification '},
            {
                'key': 'HH',
                'value': 'Transitional housing, halfway house, or sober home',
            },
            {
                'key': 'SUMH',
                'value': 'Treatment for co-occurring serious mental health ',
            },
        ],
    },
    {
        'category': [{'key': '2', 'value': 'Telemedicine'}],
        'codes': [
            {'key': 'TELE', 'value': 'TelemedicineTelemedicine/telehealth'},
        ],
    },
]

相关问题