如何使用Python 3.8 xml.etree解析HTML？

rdlzhqv9 于 2023-05-21 发布在 Python

关注(0)|答案(1)|浏览(150)

我需要用Python 3.8 xml包解析HTML文件。这一定是可能的，因为一些xml.etree.ElementTree方法的参数将"xml"或"html"作为值，但我找不到一个如何实现的示例。
当我尝试解析HTML文件时，我得到一个异常：

htmlRoot = etree.ElementTree.parse(filepathname).getroot()

解析器在遇到HTML实体时抛出“undefined entity”异常。我假设这是因为HTML实体是预定义的，而XML实体不是。
如语句所示，我使用的是默认解析器。也许有一个HTML解析器，但我还没有找到。我甚至不确定是否有其他的解析器，或者我必须自己运行。
我不想使用Python的html包，因为我需要像xml.etree提供的那样遍历一个完整的解析树。html包不是这样工作的。
我发现了一些用lxml包解析HTML的例子，但lxml不是标准Python配置的一部分。这对于不懂Python并且需要“即插即用”应用程序的同事来说是个问题。

Html

来源：https://stackoverflow.com/questions/69976011/how-to-parse-html-with-python-3-8-xml-etree

1条答案

按热度按时间

ao218c7q1#

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
    <rank>1</rank>
    <year>2008</year>
    <gdppc>141100</gdppc>
    <neighbor name="Austria" direction="E"/>
    <neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
    <rank>4</rank>
    <year>2011</year>
    <gdppc>59900</gdppc>
    <neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
    <rank>68</rank>
    <year>2011</year>
    <gdppc>13600</gdppc>
    <neighbor name="Costa Rica" direction="W"/>
    <neighbor name="Colombia" direction="E"/>
</country>

import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()
root = ET.fromstring(country_data_as_string)
for child in root:
    print(child.tag, child.attrib)

阅读本文了解更多详情https://docs.python.org/3/library/xml.etree.elementtree.html

赞(0）回复(0）举报 2023-05-21

我来回答

如何使用Python 3.8 xml.etree解析HTML？

1条答案

相关问题

热门标签

最新问答