pandas 在Python中将html读取到数据框时出错“未找到html5lib”

eaf3rand 于 2022-12-16 发布在 Python

关注(0)|答案(4)|浏览(424)

当我试图读取一个html Dataframe 时，我遇到了以下关于html5lib的错误。
下面是代码：

!pip install html5lib
!pip install lxml
!pip install beautifulSoup4

import html5lib
import lxml
from bs4 import BeautifulSoup

table_list = pd.read_html("http://www.psmsl.org/data/obtaining/")

这是错误：

ImportError                               Traceback (most recent call last)
<ipython-input-68-e24654a0a301> in <module>()
----> 1 table_list = pd.read_html("http://www.psmsl.org/data/obtaining/")

/home/sage/sage-8.0/local/lib/python2.7/site-packages/pandas/io/html.pyc in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, tupleize_cols, thousands, encoding, decimal, converters, na_values, keep_default_na)
    913                   thousands=thousands, attrs=attrs, encoding=encoding,
    914                   decimal=decimal, converters=converters, na_values=na_values,
--> 915                   keep_default_na=keep_default_na)

/home/sage/sage-8.0/local/lib/python2.7/site-packages/pandas/io/html.pyc in _parse(flavor, io, match, attrs, encoding, **kwargs)
    737     retained = None
    738     for flav in flavor:
--> 739         parser = _parser_dispatch(flav)
    740         p = parser(io, compiled_match, attrs, encoding)
    741 

/home/sage/sage-8.0/local/lib/python2.7/site-packages/pandas/io/html.pyc in _parser_dispatch(flavor)
    680     if flavor in ('bs4', 'html5lib'):
    681         if not _HAS_HTML5LIB:
--> 682             raise ImportError("html5lib not found, please install it")
    683         if not _HAS_BS4:
    684             raise ImportError(

ImportError: html5lib not found, please install it

任何帮助都将不胜感激。谢谢

pandas

来源：https://stackoverflow.com/questions/49042224/error-in-reading-html-to-data-frame-in-python-html5lib-not-found

4条答案

按热度按时间

pxyaymoc1#

如果您看到错误消息，则表示您没有安装html5lib。

pip install html5lib

在您的终端。
如果你是从jupyter notebook调用的（就像你在!中所做的那样），尝试重新启动内核以便加载包。

赞(0）回复(0）举报 2022-12-16

mmvthczy2#

我有这个确切的错误显示，而试图读取保存的.htm文件使用Spyder IDE。
此代码显示了html5lib错误：

import pandas as pd
df = pd.read_html("F:\xxxx\xxxxx\xxxxx\aaaa.htm")

我知道我已经安装了html5lib并且工作正常，因为我有其他工作的脚本。
不管出于什么原因，文件路径必须是字符串文字（在文件路径前面放一个r）。
这个代码对我有效：

import pandas as pd
df = pd.read_html(r"F:\xxxx\xxxxx\xxxxx\aaaa.htm")

赞(0）回复(0）举报 2022-12-16

eit6fx6z3#

请检查您的文件名，即使您的文件名错误，也会显示相同的错误。

赞(0）回复(0）举报 2022-12-16

sc4hvdpw4#

当我试图打开的本地文件的路径不正确时，我遇到了这个错误。所以请确保您指向了正确的位置！

赞(0）回复(0）举报 2022-12-16

我来回答

pandas 在Python中将html读取到数据框时出错“未找到html5lib”

4条答案

相关问题

热门标签

最新问答