如何从模式数据库xml中创建嵌套的dict(json)?

a9wyjsp7  于 2023-08-08  发布在  其他
关注(0)|答案(3)|浏览(104)

下面是我的输入file.xml

<?xml version="1.0" encoding="UTF-8" ?>
<project name="so_project" id="Project-9999">
    <schema name="database1">
        <table name="table1">
            <column name="foo" type="int"/>
            <column name="bar" type="string"/>
            <column name="details_resolution" type="array[object]">
                <column name="timestamp" type="timestamp"/>
                <column name="user_id" type="string"/>
                <column name="user_name" type="string"/>
            </column>
            <column name="details_closure" type="array[object]">
                <column name="timestamp" type="timestamp"/>
                <column name="auto_closure" type="bool"/>
            </column>
        </table>
    </schema>
    <schema name="database2">
        <table name="table1">
            <column name="foo" type="int"/>
            <column name="bar" type="string"/>
            <column name="details" type="array[object]">
                <column name="timestamp" type="timestamp"/>
                <column name="value" type="float"/>
            </column>
        </table>
    </schema>
</project>

字符串
..我试着做一个经典的嵌套dict:

{
    "database1": {
        "table1": {
            "foo": "int",
            "bar": "string",
            "details_resolution": {
                "timestamp": "timestamp",
                "user_id": "string",
                "user_name": "string"
            },
            "details_closure": {
                "timestamp": "timestamp",
                "auto_closure": "bool"
            }
        }
    },
    "database2": {
        "table1": {
            "foo": "int",
            "bar": "string",
            "details": {
                "timestamp": "timestamp",
                "value": "float"
            }
        }
    }
}

  • PS:每个数据库最终可以有多个表。*

我尝试了一些人工智能代码,但没有一个给我预期的结果。
我很抱歉,伙计们,不能显示我的尝试!
所以,任何帮助都将是非常感谢的。

6vl6ewon

6vl6ewon1#

使用beautifulsoup的解决方案:

from bs4 import BeautifulSoup

with open("your_file.xml", "r") as f_in:
    soup = BeautifulSoup(f_in.read(), "xml")

def parse_columns(t):
    out = {}
    for c in t.find_all("column", recursive=False):
        if c.find("column"):
            out[c["name"]] = parse_columns(c)
        else:
            out[c["name"]] = c["type"]
    return out

def parse_schema(sch):
    out = {}
    for t in sch.select("table"):
        out[t["name"]] = parse_columns(t)
    return out

out = {}
for sch in soup.select("schema"):
    out[sch["name"]] = parse_schema(sch)

print(out)

字符串
印刷品:

{
    "database1": {
        "table1": {
            "foo": "int",
            "bar": "string",
            "details_resolution": {
                "timestamp": "timestamp",
                "user_id": "string",
                "user_name": "string",
            },
            "details_closure": {"timestamp": "timestamp", "auto_closure": "bool"},
        }
    },
    "database2": {
        "table1": {
            "foo": "int",
            "bar": "string",
            "details": {"timestamp": "timestamp", "value": "float"},
        }
    },
}

qlfbtfca

qlfbtfca2#

可以使用xml.etree.ElementTree

import xml.etree.ElementTree as ET

def parse_column(column_elem):
    column_data = {}
    column_data['name'] = column_elem.get('name')
    column_data['type'] = column_elem.get('type')
    return column_data

def parse_table(table_elem):
    table_data = {}
    table_name = table_elem.get('name')
    for column_elem in table_elem.findall('column'):
        column_data = parse_column(column_elem)
        table_data[column_data['name']] = column_data['type']
    return {table_name: table_data}

def parse_schema(schema_elem):
    schema_data = {}
    schema_name = schema_elem.get('name')
    for table_elem in schema_elem.findall('table'):
        table_data = parse_table(table_elem)
        schema_data.update(table_data)
    return {schema_name: schema_data}

def parse_xml(xml_content):
    root = ET.fromstring(xml_content)
    project_data = {}
    for schema_elem in root.findall('schema'):
        schema_data = parse_schema(schema_elem)
        project_data.update(schema_data)
    return project_data

# Read XML file
with open('file.xml', 'r') as f:
    xml_content = f.read()

# Parse XML and generate nested dictionary
nested_dict = parse_xml(xml_content)
print(nested_dict)

字符串

6mzjoqzu

6mzjoqzu3#

在XSLT 3.0中:

<xsl:output method="json" indent="yes" />
  
  <xsl:template match="/">
    <xsl:map>
      <xsl:apply-templates select="*/schema"/>
    </xsl:map>
  </xsl:template>

  <xsl:template match="*[*]">
     <xsl:map-entry key="string(@name)">
        <xsl:map>
          <xsl:apply-templates select="*"/>
        </xsl:map>
     </xsl:map-entry>
  </xsl:template>
  
  <xsl:template match="*">
    <xsl:map-entry key="string(@name)" select="string(@type)"/>
  </xsl:template>

字符串
请参阅https://xsltfiddle.liberty-development.net/bdvWh3以获取完整的样式表,包括样板文件。
说明:

  • 第一个模板规则匹配文档,创建最外层的Map,并处理schema元素,跳过project级别。
  • 第二模板规则匹配具有一个或多个子级的元素;它以@name属性作为键为容器元素创建Map条目,并通过将模板规则递归地应用于子元素来生成另一Map作为内容。
  • 第三个模板规则匹配没有子元素的元素;它为容器元素创建一个Map条目,其中@name作为键,@type作为相应的值。

相关问题