摘要
处理PDF时,我在日志中看到了一个错误
重现步骤
只需上传一个PDF文件(图像),并设置OCR
预期行为
对你期望发生的事情进行清晰简洁的描述。
环境
提供的docker容器
executing command error: Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 85, in _run
check=True,
File "/usr/lib/python3.7/subprocess.py", line 472, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.7/subprocess.py", line 775, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.7/subprocess.py", line 1522, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/app-root/src/dist/assets/TableDetection2Script.py", line 212, in <module>
main()
File "/opt/app-root/src/dist/assets/TableDetection2Script.py", line 188, in main
tables2 = tabula.read_pdf(pdf_file, stream=True, pages='all', output_format="json")
File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 322, in read_pdf
output = _run(java_options, kwargs, path, encoding)
File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 91, in _run
raise JavaNotFoundError(JAVA_NOT_FOUND_ERROR)
tabula.errors.JavaNotFoundError: `java` command is not found from this Python process.Please ensure Java is installed and PATH is set for `java`
7条答案
按热度按时间brc7rcf01#
它需要在系统上预先安装Java,并将Java bin的路径设置在您的$Path环境变量中。
qltillow2#
有人能帮我理解为什么在parsr镜像中存在Java依赖项,但Java没有安装吗?谢谢!
4c8rllxm3#
我同意@csmizzle的观点。如果有必要,Docker镜像中不应该包含Java吗?
7z5jn7bk4#
在运行Docker容器时遇到了这个问题:
parsr-parsr-1 | [2023-05-06T18:58:09] INFO (parsr-api/8 on 3ba7089dce28):执行命令错误:回溯(最近的调用):
parsr-parsr-1 | File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 85, in _run
parsr-parsr-1 | check=True,
parsr-parsr-1 | File "/usr/lib/python3.7/subprocess.py", line 472, in run
parsr-parsr-1 | with Popen(*popenargs, **kwargs) as process:
parsr-parsr-1 | File "/usr/lib/python3.7/subprocess.py", line 775, in init
parsr-parsr-1 | restore_signals, start_new_session)
parsr-parsr-1 | File "/usr/lib/python3.7/subprocess.py", line 1522, in _execute_child
parsr-parsr-1 | raise child_exception_type(errno_num, err_msg, err_filename)
parsr-parsr-1 | FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'
parsr-parsr-1 |
parsr-parsr-1 | 在处理上述异常时,发生了另一个异常:
parsr-parsr-1 |
parsr-parsr-1 | Traceback (most recent call last):
parsr-parsr-1 | File "/opt/app-root/src/dist/assets/TableDetection2Script.py", line 212, in
parsr-parsr-1 | main()
parsr-parsr-1 | File "/opt/app-root/src/dist/assets/TableDetection2Script.py", line 188, in main
parsr-parsr-1 | tables2 = tabula.read_pdf(pdf_file, stream=True, pages='all', output_format="json")
parsr-parsr-1 | File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 322, in read_pdf
parsr-parsr-1 | output = _run(java_options, kwargs, path, encoding)
parsr-parsr-1 | File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 91, in _run
parsr-parsr-1 | raise JavaNotFoundError(JAVA_NOT_FOUND_ERROR)
parsr-parsr-1 | tabula.errors.JavaNotFoundError:
java
命令在当前Python进程中找不到。请确保已安装Java并为java
设置PATH。parsr-parsr-1 |
parsr-parsr-1 | [2023-05-06T18:58:10] INFO (parsr-api/8 on 3ba7089dce28):文档上找到的表格数量为0。
kkih6yb85#
我今晚会处理这个问题,明天上午我会打开一个PR。
r1wp621o6#
大家在努力解决这个问题,同时也遇到了客户端的困难。
46qrfjad7#
关于这个问题的任何信息吗?谢谢!