Parsr Java issue showing up during processing

ghhaqwfi  于 5个月前  发布在  Java
关注(0)|答案(7)|浏览(126)

摘要

处理PDF时,我在日志中看到了一个错误

重现步骤

只需上传一个PDF文件(图像),并设置OCR

预期行为

对你期望发生的事情进行清晰简洁的描述。

环境

提供的docker容器

executing command error: Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 85, in _run
    check=True,
  File "/usr/lib/python3.7/subprocess.py", line 472, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib/python3.7/subprocess.py", line 775, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.7/subprocess.py", line 1522, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/app-root/src/dist/assets/TableDetection2Script.py", line 212, in <module>
    main()
  File "/opt/app-root/src/dist/assets/TableDetection2Script.py", line 188, in main
    tables2 = tabula.read_pdf(pdf_file, stream=True, pages='all', output_format="json")
  File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 322, in read_pdf
    output = _run(java_options, kwargs, path, encoding)
  File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 91, in _run
    raise JavaNotFoundError(JAVA_NOT_FOUND_ERROR)
tabula.errors.JavaNotFoundError: `java` command is not found from this Python process.Please ensure Java is installed and PATH is set for `java`
brc7rcf0

brc7rcf01#

它需要在系统上预先安装Java,并将Java bin的路径设置在您的$Path环境变量中。

qltillow

qltillow2#

有人能帮我理解为什么在parsr镜像中存在Java依赖项,但Java没有安装吗?谢谢!

4c8rllxm

4c8rllxm3#

我同意@csmizzle的观点。如果有必要,Docker镜像中不应该包含Java吗?

7z5jn7bk

7z5jn7bk4#

在运行Docker容器时遇到了这个问题:

parsr-parsr-1 | [2023-05-06T18:58:09] INFO (parsr-api/8 on 3ba7089dce28):执行命令错误:回溯(最近的调用):
parsr-parsr-1 | File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 85, in _run
parsr-parsr-1 | check=True,
parsr-parsr-1 | File "/usr/lib/python3.7/subprocess.py", line 472, in run
parsr-parsr-1 | with Popen(*popenargs, **kwargs) as process:
parsr-parsr-1 | File "/usr/lib/python3.7/subprocess.py", line 775, in init
parsr-parsr-1 | restore_signals, start_new_session)
parsr-parsr-1 | File "/usr/lib/python3.7/subprocess.py", line 1522, in _execute_child
parsr-parsr-1 | raise child_exception_type(errno_num, err_msg, err_filename)
parsr-parsr-1 | FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'
parsr-parsr-1 |
parsr-parsr-1 | 在处理上述异常时,发生了另一个异常:
parsr-parsr-1 |
parsr-parsr-1 | Traceback (most recent call last):
parsr-parsr-1 | File "/opt/app-root/src/dist/assets/TableDetection2Script.py", line 212, in
parsr-parsr-1 | main()
parsr-parsr-1 | File "/opt/app-root/src/dist/assets/TableDetection2Script.py", line 188, in main
parsr-parsr-1 | tables2 = tabula.read_pdf(pdf_file, stream=True, pages='all', output_format="json")
parsr-parsr-1 | File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 322, in read_pdf
parsr-parsr-1 | output = _run(java_options, kwargs, path, encoding)
parsr-parsr-1 | File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 91, in _run
parsr-parsr-1 | raise JavaNotFoundError(JAVA_NOT_FOUND_ERROR)
parsr-parsr-1 | tabula.errors.JavaNotFoundError: java 命令在当前Python进程中找不到。请确保已安装Java并为 java 设置PATH。
parsr-parsr-1 |
parsr-parsr-1 | [2023-05-06T18:58:10] INFO (parsr-api/8 on 3ba7089dce28):文档上找到的表格数量为0。

kkih6yb8

kkih6yb85#

我今晚会处理这个问题,明天上午我会打开一个PR。

r1wp621o

r1wp621o6#

大家在努力解决这个问题,同时也遇到了客户端的困难。

46qrfjad

46qrfjad7#

关于这个问题的任何信息吗?谢谢!

相关问题