由于poppler,在CentOS上以Python 3.6安装pdftotext时出现问题

cqoc49vn  于 2022-11-07  发布在  Python
关注(0)|答案(3)|浏览(426)

我在CentOS上安装Python 3.6(Anaconda 5.1.0)中的pdftotext时遇到了一些问题。
首先简要说明:

  • 我在VirtualBox上使用CentOS 6.7
  • 我知道它 * 可以 * 工作,因为我的IT组在我们的服务器上安装了它。**注意:**我发现我们的服务器 * 确实 * 安装了C++ Package 器,我正在试图找出是如何得到它的。
  • 我正试图让一个现有的应用程序工作,所以我不寻找一个替代pdftotext库在这个时候。

我按照github repo的说明,已经尝试了这一步:
Fedora、红帽和朋友们:

sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python-devel redhat-rpm-config

但是问题似乎是关于poppler-cpp-developer的。我在yum search poppler中没有看到这个包:

============================= N/S Matched: poppler =============================
poppler-devel.i686 : Libraries and headers for poppler
poppler-devel.x86_64 : Libraries and headers for poppler
poppler-glib.i686 : Glib wrapper for poppler
poppler-glib.x86_64 : Glib wrapper for poppler
poppler-qt.i686 : Qt3 wrapper for poppler
poppler-qt.x86_64 : Qt3 wrapper for poppler
poppler-qt4.i686 : Qt4 wrapper for poppler
poppler-qt4.x86_64 : Qt4 wrapper for poppler
poppler.i686 : PDF rendering library
poppler.x86_64 : PDF rendering library
poppler-data.noarch : Encoding files
poppler-glib-devel.i686 : Development files for glib wrapper
poppler-glib-devel.x86_64 : Development files for glib wrapper
poppler-qt-devel.i686 : Development files for Qt3 wrapper
poppler-qt-devel.x86_64 : Development files for Qt3 wrapper
poppler-qt4-devel.i686 : Development files for Qt4 wrapper
poppler-qt4-devel.x86_64 : Development files for Qt4 wrapper
poppler-utils.x86_64 : Command line utilities for converting PDF files

我的IT团队给了我他们尝试的指导,我尝试安装poppler-develpoppler-glib。但是每次我尝试安装pip install pdftotext时,我都会得到以下输出:

[root@localhost stack]# pip install pdftotext
Collecting pdftotext
  Using cached https://files.pythonhosted.org/packages/21/35/60094dbadd9de2035873390b1cac25e01da605844eba6a07a53a82fa4adc/pdftotext-2.1.1.tar.gz
Building wheels for collected packages: pdftotext
  Building wheel for pdftotext (setup.py) ... error
  Complete output from command /root/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-1mu2f1n2/pdftotext/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/pip-wheel-khm9zova --python-tag cp36:
  /root/anaconda3/lib/python3.6/distutils/dist.py:261: UserWarning: Unknown distribution option: 'long_description_content_type'
    warnings.warn(msg)
  running bdist_wheel
  running build
  running build_ext
  building 'pdftotext' extension
  creating build
  creating build/temp.linux-x86_64-3.6
  gcc -pthread -B /root/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DPOPPLER_CPP_AT_LEAST_0_30_0=0 -I/root/anaconda3/include/python3.6m -c pdftotext.cpp -o build/temp.linux-x86_64-3.6/pdftotext.o -Wall
  cc1plus: warning: command line option "-Wstrict-prototypes" is valid for Ada/C/ObjC but not for C++
  pdftotext.cpp:3:42: error: poppler/cpp/poppler-document.h: No such file or directory
  pdftotext.cpp:4:40: error: poppler/cpp/poppler-global.h: No such file or directory
  pdftotext.cpp:5:38: error: poppler/cpp/poppler-page.h: No such file or directory
  pdftotext.cpp:20: error: ‘poppler’ has not been declared
  pdftotext.cpp:20: error: ISO C++ forbids declaration of ‘document’ with no type
  pdftotext.cpp:20: error: expected ‘;’ before ‘*’ token
  pdftotext.cpp: In function ‘void PDF_clear(PDF*)’:
  pdftotext.cpp:26: error: ‘struct PDF’ has no member named ‘doc’
  pdftotext.cpp:27: error: ‘struct PDF’ has no member named ‘doc’
  pdftotext.cpp: In function ‘int PDF_create_doc(PDF*)’:
  pdftotext.cpp:66: error: ‘struct PDF’ has no member named ‘doc’
  pdftotext.cpp:66: error: ‘poppler’ has not been declared
  pdftotext.cpp:67: error: ‘struct PDF’ has no member named ‘doc’
  pdftotext.cpp: In function ‘int PDF_unlock(PDF*, char*)’:
  pdftotext.cpp:75: error: ‘struct PDF’ has no member named ‘doc’
  pdftotext.cpp: In function ‘int PDF_init(PDF*, PyObject*, PyObject*)’:
  pdftotext.cpp:105: error: ‘struct PDF’ has no member named ‘doc’
  pdftotext.cpp: In function ‘PyObject* PDF_read_page(PDF*, int)’:
  pdftotext.cpp:119: error: ‘poppler’ has not been declared
  pdftotext.cpp:119: error: expected initializer before ‘*’ token
  pdftotext.cpp:120: error: ‘poppler’ has not been declared
  pdftotext.cpp:120: error: expected ‘;’ before ‘layout_mode’
  pdftotext.cpp:123: error: ‘page’ was not declared in this scope
  pdftotext.cpp:123: error: ‘struct PDF’ has no member named ‘doc’
  pdftotext.cpp:129: error: ‘poppler’ has not been declared
  pdftotext.cpp:129: error: expected initializer before ‘rect’
  pdftotext.cpp:130: error: ‘rect’ was not declared in this scope
  pdftotext.cpp:133: error: ‘layout_mode’ was not declared in this scope
  pdftotext.cpp:133: error: ‘poppler’ has not been declared
  pdftotext.cpp:135: error: ‘poppler’ has not been declared
  pdftotext.cpp:137: error: ‘poppler’ has not been declared
  pdftotext.cpp:138: error: type ‘<type error>’ argument given to ‘delete’, expected pointer
  error: command 'gcc' failed with exit status 1

  ----------------------------------------
  Failed building wheel for pdftotext
  Running setup.py clean for pdftotext
Failed to build pdftotext
Installing collected packages: pdftotext
  Running setup.py install for pdftotext ... error
    Complete output from command /root/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-1mu2f1n2/pdftotext/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-ghuhvuhl/install-record.txt --single-version-externally-managed --compile:
    /root/anaconda3/lib/python3.6/distutils/dist.py:261: UserWarning: Unknown distribution option: 'long_description_content_type'
      warnings.warn(msg)
    running install
    running build
    running build_ext
    building 'pdftotext' extension
    creating build
    creating build/temp.linux-x86_64-3.6
    gcc -pthread -B /root/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DPOPPLER_CPP_AT_LEAST_0_30_0=0 -I/root/anaconda3/include/python3.6m -c pdftotext.cpp -o build/temp.linux-x86_64-3.6/pdftotext.o -Wall
    cc1plus: warning: command line option "-Wstrict-prototypes" is valid for Ada/C/ObjC but not for C++
    pdftotext.cpp:3:42: error: poppler/cpp/poppler-document.h: No such file or directory
    pdftotext.cpp:4:40: error: poppler/cpp/poppler-global.h: No such file or directory
    pdftotext.cpp:5:38: error: poppler/cpp/poppler-page.h: No such file or directory
    pdftotext.cpp:20: error: ‘poppler’ has not been declared
    pdftotext.cpp:20: error: ISO C++ forbids declaration of ‘document’ with no type
    pdftotext.cpp:20: error: expected ‘;’ before ‘*’ token
    pdftotext.cpp: In function ‘void PDF_clear(PDF*)’:
    pdftotext.cpp:26: error: ‘struct PDF’ has no member named ‘doc’
    pdftotext.cpp:27: error: ‘struct PDF’ has no member named ‘doc’
    pdftotext.cpp: In function ‘int PDF_create_doc(PDF*)’:
    pdftotext.cpp:66: error: ‘struct PDF’ has no member named ‘doc’
    pdftotext.cpp:66: error: ‘poppler’ has not been declared
    pdftotext.cpp:67: error: ‘struct PDF’ has no member named ‘doc’
    pdftotext.cpp: In function ‘int PDF_unlock(PDF*, char*)’:
    pdftotext.cpp:75: error: ‘struct PDF’ has no member named ‘doc’
    pdftotext.cpp: In function ‘int PDF_init(PDF*, PyObject*, PyObject*)’:
    pdftotext.cpp:105: error: ‘struct PDF’ has no member named ‘doc’
    pdftotext.cpp: In function ‘PyObject* PDF_read_page(PDF*, int)’:
    pdftotext.cpp:119: error: ‘poppler’ has not been declared
    pdftotext.cpp:119: error: expected initializer before ‘*’ token
    pdftotext.cpp:120: error: ‘poppler’ has not been declared
    pdftotext.cpp:120: error: expected ‘;’ before ‘layout_mode’
    pdftotext.cpp:123: error: ‘page’ was not declared in this scope
    pdftotext.cpp:123: error: ‘struct PDF’ has no member named ‘doc’
    pdftotext.cpp:129: error: ‘poppler’ has not been declared
    pdftotext.cpp:129: error: expected initializer before ‘rect’
    pdftotext.cpp:130: error: ‘rect’ was not declared in this scope
    pdftotext.cpp:133: error: ‘layout_mode’ was not declared in this scope
    pdftotext.cpp:133: error: ‘poppler’ has not been declared
    pdftotext.cpp:135: error: ‘poppler’ has not been declared
    pdftotext.cpp:137: error: ‘poppler’ has not been declared
    pdftotext.cpp:138: error: type ‘<type error>’ argument given to ‘delete’, expected pointer
    error: command 'gcc' failed with exit status 1

    ----------------------------------------
Command "/root/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-1mu2f1n2/pdftotext/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-ghuhvuhl/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-install-1mu2f1n2/pdftotext/

我假设这里的问题是它在寻找C++编译的文件,而我只能得到glib?
我能查到什么?

lrl1mhuk

lrl1mhuk1#

  • pdftotext* 应该在poppler-utils中,所以请尝试yum install poppler-utils
    **编辑:**嗯。在EPEL库中有一个名为 pypoppler 的软件包可用于CentOS 6,它将自己描述为“Poppler PDF渲染库的Python绑定”。我看不出它包含 poppler/cpp/{anything},但您可以给予一下。(您可能需要先安装 pycairo。)

如果做不到这一点,你可以尝试安装 pdftotext 的早期版本(例如pip install pdftotext==1.0.0),以找到一个与CentOS 6兼容的版本。不过,最早的版本是在2017年6月发布的,所以这可能没有帮助。
我想您对升级到CentOS 7不感兴趣吧?

ego6inou

ego6inou2#

实际上有一个合适的CentOS 7软件包。安装poppler-devel软件包是不够的,因为它不包括所需的CPP头文件,您只需安装poppler-cpp和poppler-cpp-devel软件包:
yum安装弹出器-cpp弹出器-cpp-开发
在此之后,您可以pip/pip 3安装pdftotext包,而无需自定义编译和设置环境路径变量。

hrysbysz

hrysbysz3#

我找到了这个问题的解决方案。通过按照从这个链接安装libpoppler-cpp的说明操作,我能够成功地安装pdftotext
按照此存储库中的说明进行操作:

在CentOS上

在CentOS上,libpoppler-cpp库不包含在系统中,所以我们需要从源代码构建。注意,最新版本的poppler需要C++11,而这在CentOS上是不可用的,所以我们构建了一个稍微旧一些的libpoppler版本。


# Build dependencies

yum install wget xz libjpeg-devel openjpeg2-devel

# Download and extract

wget https://poppler.freedesktop.org/poppler-0.47.0.tar.xz
tar -Jxvf poppler-0.47.0.tar.xz
cd poppler-0.47.0

# Build and install

./configure
make
sudo make install

默认情况下,库安装在/usr/local/lib/usr/local/include中。在CentOS上,这不是默认的搜索路径,所以我们需要设置PKG_CONFIG_PATHLD_LIBRARY_PATH,使R指向正确的目录:

export LD_LIBRARY_PATH="/usr/local/lib"
export PKG_CONFIG_PATH="/usr/local/lib/pkgconfig"

相关问题