如何使用Python从HTML响应中提取数据?

6za6bjd0  于 2023-04-22  发布在  Python
关注(0)|答案(2)|浏览(171)

我试图从在Python中执行API后得到的HTML响应中提取一些数据。下面是我得到的HTML响应:

<?xml version="1.0" ?>
 <mgmtResponse responseType="operation" requestUrl="https://6.7.7.7/motion/api/v1/op/enablement/ethernet/summary?deviceIpAddress=10.3.4.3" rootUrl="https://6.7.7.7/webacs/api/v1/op">
   <ethernetSummaryDTO>
     <CoredIdentityCapable>false</CoredIdentityCapable>
     <currentIcmpLatency>0</currentIcmpLatency>
     <deviceAvailability>100</deviceAvailability>
     <deviceName>TRP5504.130.Cored.com</deviceName>
     <deviceRole>Unknown</deviceRole>
     <deviceType>Cored TRP 5504</deviceType>
     <ipAddress>10.3.4.3</ipAddress>
     <locationCapable>false</locationCapable>
     <nrPortsDown>49</nrPortsDown>
     <nrPortsUp>16</nrPortsUp>
     <reachability>Reachable</reachability>
     <softwareVersion>7.8.1</softwareVersion>
     <stackCount>0</stackCount>
     <systemTime>2023-Apr-16, 12:47:51 IST</systemTime>
     <udiDetails>
       <description>TRP5500 4 Slot Single Chassis</description>
       <modelNr>TRP-5504</modelNr>
       <name>Rack 0</name>
       <productId>TRP-5504</productId>
       <udiSerialNr>FOX2304P14Z</udiSerialNr>
       <vendor>Cored Systems, Inc.</vendor>
       <versionId>V01</versionId>
     </udiDetails>
     <upTime>87 days 20 hrs 40 mins 27 secs</upTime>
   </ethernetSummaryDTO>
 </mgmtResponse>

基本上,我想从HTML响应中提取像deviceNamesoftwareVersion以及`udiSerialNr'这样的数据。我尝试了以下代码:

if response.status_code == 200:
                #resp = response.text
                resp = response.json()
                api_resp = resp["ethernetSummaryDTO"]
                print(api_resp)

所以我试图将其转换为JSON,但我以下面的错误结束:

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

如何解析它以提取所需的数据?

qybjjes1

qybjjes11#

请按如下方式解析响应文本

import xml.etree.ElementTree as ET

if response.status_code == 200:
    # resp = response.text
    resp = response.text
    parse_response(resp)
    
def parse_response(xml_string):
    root = ET.fromstring(xml_string)

    ethernet_summary = root.find('ethernetSummaryDTO')
    device_name = ethernet_summary.find('deviceName').text
    device_type = ethernet_summary.find('deviceType').text
    ip_address = ethernet_summary.find('ipAddress').text
    nr_ports_down = int(ethernet_summary.find('nrPortsDown').text)
    nr_ports_up = int(ethernet_summary.find('nrPortsUp').text)
    software_version = ethernet_summary.find('softwareVersion').text
    up_time = ethernet_summary.find('upTime').text

    data = {
        'device_name': device_name,
        'device_type': device_type,
        'ip_address': ip_address,
        'nr_ports_down': nr_ports_down,
        'nr_ports_up': nr_ports_up,
        'software_version': software_version,
        'up_time': up_time
    }

    return data
2wnc66cl

2wnc66cl2#

给出你的响应(我将把它赋值给一个变量,就像我从API调用中得到的一样):

xml_data = '''<?xml version="1.0" ?>
 <mgmtResponse responseType="operation" requestUrl="https://6.7.7.7/motion/api/v1/op/enablement/ethernet/summary?deviceIpAddress=10.3.4.3" rootUrl="https://6.7.7.7/webacs/api/v1/op">
   <ethernetSummaryDTO>
     <CoredIdentityCapable>false</CoredIdentityCapable>
     <currentIcmpLatency>0</currentIcmpLatency>
     <deviceAvailability>100</deviceAvailability>
     <deviceName>TRP5504.130.Cored.com</deviceName>
     <deviceRole>Unknown</deviceRole>
     <deviceType>Cored TRP 5504</deviceType>
     <ipAddress>10.3.4.3</ipAddress>
     <locationCapable>false</locationCapable>
     <nrPortsDown>49</nrPortsDown>
     <nrPortsUp>16</nrPortsUp>
     <reachability>Reachable</reachability>
     <softwareVersion>7.8.1</softwareVersion>
     <stackCount>0</stackCount>
     <systemTime>2023-Apr-16, 12:47:51 IST</systemTime>
     <udiDetails>
       <description>TRP5500 4 Slot Single Chassis</description>
       <modelNr>TRP-5504</modelNr>
       <name>Rack 0</name>
       <productId>TRP-5504</productId>
       <udiSerialNr>FOX2304P14Z</udiSerialNr>
       <vendor>Cored Systems, Inc.</vendor>
       <versionId>V01</versionId>
     </udiDetails>
     <upTime>87 days 20 hrs 40 mins 27 secs</upTime>
   </ethernetSummaryDTO>
 </mgmtResponse>'''

您可以使用xml.etree.ElementTree模块来解析它。
例如:

import xml.etree.ElementTree as ET

# The first element of your XML is the mgmtResponse, I'm directly getting it with [0]
root = ET.fromstring(xml_data)[0]
softwareVersion = root.find("softwareVersion").text
deviceName = root.find("deviceName").text

# For the udiDetails attributes
udiDetails = root.find("udiDetails")
udiSerialNr = [det for det in udiDetails if det.tag == "udiSerialNr"][0].text
# and so on..

获取udiSerialNr的最后一行是list comprehension,它允许直接从循环中获取值,基本上它是一行中的for循环,相当于:

udiDetails = root.find("udiDetails")
udiSerialNr = ""
for det in udiDetails:
    if det.tag == "udiSerialNr":
        udiSerialNr = det.text

基本上,在XML中,每个节点都是一个新的列表,所以mgmtResponse是第一个列表(只有一个记录ethernetSummaryDTO,这就是为什么我直接设置ET.fromstring(xml_data)[0]来获取它)。
ethernetSummaryDTO是第二个列表,但我们不迭代它,我们使用.find方法来获取属性(例如softwareVersion)。
udiDetails只是另一个列表,我已经使用了for循环来获取它的属性,但我刚刚尝试过,我们可以再次使用.find(),这使得它更容易,没有不必要的代码:

udiDetails = root.find("udiDetails")
udiSerialNr = udiDetails.find("udiSerialNr").text

好多了!

相关问题