如何在python中迭代一个xml/json/dict的所有级别?

ahy6op9u  于 2022-12-15  发布在  Python
关注(0)|答案(1)|浏览(97)

我收到了来自API端点的以下xml响应:
'

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Events>
    <eventData>
        <eventID>32669037</eventID>
        <userID>
            <loginID>userone</loginID>
            <userDN>cn=userone,cn=Users,dc=us,dc=users,dc=com</userDN>
        </userID>
        <type>Logout</type>
        <ipAddress>1.2.3.4</ipAddress>
        <status>success</status>
        <accessTime>2022-12-04T09:56:39.678Z</accessTime>
        <ECID>abcdefgh</ECID>
        <attributeMap>
            <attribute>
                <key>User-Agent</key>
                <value>Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) CasperJS/1.1.3+PhantomJS/2.1.1 Safari/538.1</value>
            </attribute>
        </attributeMap>
    </eventData>
    <eventData>
        <eventID>62669036</eventID>
        <userID>
            <loginID>usertwo</loginID>
            <userDN>cn=usertwo,cn=Users,dc=us,dc=users,dc=com</userDN>
        </userID>
        <type>CredentialValidation</type>
        <ipAddress>5.6.7.8</ipAddress>
        <status>success</status>
        <accessTime>2022-12-04T09:53:06.779Z</accessTime>
        <ECID>adfxx^^sfdffd</ECID>
        <attributeMap>
            <attribute>
                <key>User-Agent</key>
                <value>Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) CasperJS/1.1.3+PhantomJS/2.1.1 Safari/538.1</value>
            </attribute>
        </attributeMap>
    </eventData>
</Events>

'
我的目标是将每个“eventData”扁平化为一行以加载到表中,因此上面的代码片段是2行。
我尝试过xmltodict并迭代字典,但我无法得到“较低级别”的值。json.dumps和.loads after也是如此。与转换为 Dataframe 类似,似乎无法使用for循环来遍历整个过程并同时访问较低级别的成员值。
我怎样才能做到这一点?
每次我在不同的对象类型上尝试for循环时,似乎总是在最高级别(事件)
此外,如果我尝试读取值,例如print(['Events']['eventData'][0]['eventID'])或访问较低级别的成员,我可以迭代,但无法正确定义循环通过[0]的范围

nbnkbykc

nbnkbykc1#

您可以尝试使用beautifulsoup将XML文档解析为DataFrame:

import pandas as pd
from bs4 import BeautifulSoup

xml_doc = """\
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Events>
    <eventData>
        <eventID>32669037</eventID>
        <userID>
            <loginID>userone</loginID>
            <userDN>cn=userone,cn=Users,dc=us,dc=users,dc=com</userDN>
        </userID>
        <type>Logout</type>
        <ipAddress>1.2.3.4</ipAddress>
        <status>success</status>
        <accessTime>2022-12-04T09:56:39.678Z</accessTime>
        <ECID>abcdefgh</ECID>
        <attributeMap>
            <attribute>
                <key>User-Agent</key>
                <value>Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) CasperJS/1.1.3+PhantomJS/2.1.1 Safari/538.1</value>
            </attribute>
        </attributeMap>
    </eventData>
    <eventData>
        <eventID>62669036</eventID>
        <userID>
            <loginID>usertwo</loginID>
            <userDN>cn=usertwo,cn=Users,dc=us,dc=users,dc=com</userDN>
        </userID>
        <type>CredentialValidation</type>
        <ipAddress>5.6.7.8</ipAddress>
        <status>success</status>
        <accessTime>2022-12-04T09:53:06.779Z</accessTime>
        <ECID>adfxx^^sfdffd</ECID>
        <attributeMap>
            <attribute>
                <key>User-Agent</key>
                <value>Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) CasperJS/1.1.3+PhantomJS/2.1.1 Safari/538.1</value>
            </attribute>
        </attributeMap>
    </eventData>
</Events>"""

soup = BeautifulSoup(xml_doc, "xml")

df = pd.DataFrame(
    [
        {
            t.name: tmp
            if len(tmp := t.get_text(strip=True, separator="|").split("|")) > 1
            else tmp[0]
            for t in d.find_all(recursive=False)
        }
        for d in soup.find_all("eventData")
    ]
)
print(df)

图纸:

eventID                                                userID                  type ipAddress   status                accessTime           ECID                                                                                                                         attributeMap
0  32669037  [userone, cn=userone,cn=Users,dc=us,dc=users,dc=com]                Logout   1.2.3.4  success  2022-12-04T09:56:39.678Z       abcdefgh  [User-Agent, Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) CasperJS/1.1.3+PhantomJS/2.1.1 Safari/538.1]
1  62669036  [usertwo, cn=usertwo,cn=Users,dc=us,dc=users,dc=com]  CredentialValidation   5.6.7.8  success  2022-12-04T09:53:06.779Z  adfxx^^sfdffd  [User-Agent, Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) CasperJS/1.1.3+PhantomJS/2.1.1 Safari/538.1]

相关问题