我已经尝试了所有提到的here和其他地方的组合,但我不断得到相同的错误。
下面是我的Dockerfile
:
FROM python:3.9
RUN pip install virtualenv && virtualenv venv -p python3
ENV VIRTUAL_ENV=/venv
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt
RUN git clone https://github.com/facebookresearch/detectron2.git
RUN python -m pip install -e detectron2
# Install dependencies
RUN apt-get update && apt-get install libgl1 -y
RUN pip install -U nltk
RUN [ "python3", "-c", "import nltk; nltk.download('punkt', download_dir='/usr/local/nltk_data')" ]
COPY . /app
# Run the application:
CMD ["python", "-u", "app.py"]
Docker镜像构建得很好(我使用平台参数构建在Linux中运行的镜像,但我构建镜像的本地机器是Windows,并且detectron
库没有安装在Windows中):
>>> docker buildx build --platform=linux/amd64 -t my_app .
[+] Building 23.2s (16/16) FINISHED
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 634B 0.0s
=> [internal] load metadata for docker.io/library/python:3.9 0.9s
=> [internal] load build context 0.0s
=> => transferring context: 1.85kB 0.0s
=> [ 1/11] FROM docker.io/library/python:3.9@sha256:6ea9dafc96d7914c5c1d199f1f0195c4e05cf017b10666ca84cb7ce8e269 0.0s
=> CACHED [ 2/11] RUN pip install virtualenv && virtualenv venv -p python3 0.0s
=> CACHED [ 3/11] WORKDIR /app 0.0s
=> CACHED [ 4/11] COPY requirements.txt ./ 0.0s
=> CACHED [ 5/11] RUN pip install -r requirements.txt 0.0s
=> CACHED [ 6/11] RUN git clone https://github.com/facebookresearch/detectron2.git 0.0s
=> CACHED [ 7/11] RUN python -m pip install -e detectron2 0.0s
=> CACHED [ 8/11] RUN apt-get update && apt-get install libgl1 -y 0.0s
=> CACHED [ 9/11] RUN pip install -U nltk 0.0s
=> [10/11] RUN [ "python3", "-c", "import nltk; nltk.download('punkt', download_dir='/usr/local/nltk_data')" ] 22.1s
=> [11/11] COPY . /app 0.0s
=> exporting to image 0.1s
=> => exporting layers 0.1s
=> => writing image sha256:83e2495addbc4cdf9b0885e1bb4c5b0fb0777177956eda56950bbf59c095d23b 0.0s
=> => naming to docker.io/library/my_app
但我在尝试运行图像时一直得到下面的错误:
>>> docker run -p 8080:8080 my_app
[nltk_data] Error loading punkt: <urlopen error EOF occurred in
[nltk_data] violation of protocol (_ssl.c:1129)>
[nltk_data] Error loading punkt: <urlopen error EOF occurred in
[nltk_data] violation of protocol (_ssl.c:1129)>
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data] EOF occurred in violation of protocol (_ssl.c:1129)>
Traceback (most recent call last):
File "/app/app.py", line 16, in <module>
index = VectorstoreIndexCreator().from_loaders(loaders)
File "/venv/lib/python3.9/site-packages/langchain/indexes/vectorstore.py", line 72, in from_loaders
docs.extend(loader.load())
File "/venv/lib/python3.9/site-packages/langchain/document_loaders/unstructured.py", line 70, in load
elements = self._get_elements()
File "/venv/lib/python3.9/site-packages/langchain/document_loaders/pdf.py", line 37, in _get_elements
return partition_pdf(filename=self.file_path, **self.unstructured_kwargs)
File "/venv/lib/python3.9/site-packages/unstructured/partition/pdf.py", line 75, in partition_pdf
return partition_pdf_or_image(
File "/venv/lib/python3.9/site-packages/unstructured/partition/pdf.py", line 137, in partition_pdf_or_image
return _partition_pdf_with_pdfminer(
File "/venv/lib/python3.9/site-packages/unstructured/utils.py", line 43, in wrapper
return func(*args, **kwargs)
File "/venv/lib/python3.9/site-packages/unstructured/partition/pdf.py", line 248, in _partition_pdf_with_pdfminer
elements = _process_pdfminer_pages(
File "/venv/lib/python3.9/site-packages/unstructured/partition/pdf.py", line 293, in _process_pdfminer_pages
_elements = partition_text(text=text)
File "/venv/lib/python3.9/site-packages/unstructured/partition/text.py", line 89, in partition_text
elif is_possible_narrative_text(ctext):
File "/venv/lib/python3.9/site-packages/unstructured/partition/text_type.py", line 76, in is_possible_narrative_text
if exceeds_cap_ratio(text, threshold=cap_threshold):
File "/venv/lib/python3.9/site-packages/unstructured/partition/text_type.py", line 273, in exceeds_cap_ratio
if sentence_count(text, 3) > 1:
File "/venv/lib/python3.9/site-packages/unstructured/partition/text_type.py", line 222, in sentence_count
sentences = sent_tokenize(text)
File "/venv/lib/python3.9/site-packages/unstructured/nlp/tokenize.py", line 38, in sent_tokenize
return _sent_tokenize(text)
File "/venv/lib/python3.9/site-packages/nltk/tokenize/__init__.py", line 106, in sent_tokenize
tokenizer = load(f"tokenizers/punkt/{language}.pickle")
File "/venv/lib/python3.9/site-packages/nltk/data.py", line 750, in load
opened_resource = _open(resource_url)
File "/venv/lib/python3.9/site-packages/nltk/data.py", line 876, in _open
return find(path_, path + [""]).open()
File "/venv/lib/python3.9/site-packages/nltk/data.py", line 583, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('punkt')
For more information see: https://www.nltk.org/data.html
Attempted to load tokenizers/punkt/PY3/english.pickle
Searched in:
- '/root/nltk_data'
- '/venv/nltk_data'
- '/venv/share/nltk_data'
- '/venv/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- ''
**********************************************************************
1条答案
按热度按时间ozxc1zmp1#
我断开了我的机器与WiFi的连接,并将其连接到我的手机的热点,然后它运行没有任何错误,因为它现在能够下载NLTK包。非常奇怪(和愚蠢)的问题。我想知道是否有更好的解决方案,因为没有其他方法对我有效。