python-3.x pyPDF2尝试提取文本时出现TypeError

gk7wooem 于 2022-12-24 发布在 Python

关注(0)|答案(1)|浏览(231)

我已经成功安装了pyPDF，但是extractText方法不太好用，所以我决定试试pyPDF2，问题是，提取文本时有一个异常：

Traceback (most recent call last):
  File "C:\Users\Asus\Desktop\pfdtest.py", line 44, in <module>
    test2()
  File "C:\Users\Asus\Desktop\pfdtest.py", line 41, in test2
    print(mypdf.getPage(0).extractText())
  File "C:\Python32\lib\site-packages\PyPDF2\pdf.py", line 1701, in extractText
    content = ContentStream(content, self.pdf)
  File "C:\Python32\lib\site-packages\PyPDF2\pdf.py", line 1783, in __init__
    stream = StringIO(stream.getData())
TypeError: initial_value must be str or None, not bytes

这是我的示例代码：

filename = "myfile.pdf"
f = open(filename,'rb')
mypdf = PdfFileReader(f)
print(f,mypdf,mypdf.getNumPages())
print(mypdf.getPage(0).extractText())

它正确地确定了PDF中的页数，但在阅读流时出现了问题。

python-3.x

来源：https://stackoverflow.com/questions/17270387/pypdf2-typeerror-when-trying-to-extract-text

1条答案

按热度按时间

mmvthczy1#

这个问题与PyPDF2和Python 3的兼容性有关。
在我的例子中，我通过用here替换pdf.py和utils.py来解决这个问题，如果你运行的是Python 3，它们基本上控制你是否在运行Python 3，如果你运行的是Python 3，那么接收数据作为字节而不是字符串。

赞(0）回复(0）举报 2022-12-24

我来回答

python-3.x pyPDF2尝试提取文本时出现TypeError

1条答案

相关问题

热门标签

最新问答