描述
我正在尝试复制 Langchain tutorial 以使用 OllamaFunctions 进行网页抓取,就像在 Google Colab 环境中展示的那样。
代码
[1] %%capture
!pip install langchain_experimental
[2] from langchain_experimental.llms.ollama_functions import OllamaFunctions
lm = OllamaFunctions(model="llama2:13b",
base_url="http://localhost:11434",
temperature=0)
[3] %%capture
!pip install -q langchain-openai langchain playwright beautifulsoup4
!playwright install
[4] import nest_asyncio
nest_asyncio.apply()
[5] from langchain.chains import create_extraction_chain
schema = {
"properties": {
"news_article_title": {"type": "string"},
"news_article_summary": {"type": "string"},
},
"required": ["news_article_title", "news_article_summary"],
}
def extract(content: str, schema: dict):
return create_extraction_chain(schema=schema, llm=llm, verbose=True).invoke(content)
[6] import pprint
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import AsyncChromiumLoader
from langchain_community.document_transformers import BeautifulSoupTransformer
def scrape_with_playwright(urls, schema):
loader = AsyncChromiumLoader(urls)
docs = loader.load()
bs_transformer = BeautifulSoupTransformer()
docs_transformed = bs_transformer.transform_documents(
docs, tags_to_extract=["span"]
)
print("Extracting content with LLM")
# Grab the first 1000 tokens of the site
splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
chunk_size=1000,
chunk_overlap=0,
separators=["\n"]
)
splits = splitter.split_documents(docs_transformed)
print("Number of splits:", len(splits)) # Add this debugging statement
if splits: # Check if splits list is not empty
# Process the first split
extracted_content = extract(schema=schema, content=splits[0].page_content) # Line where error occurs
pprint.pprint(extracted_content)
return extracted_content
else:
print("No splits found") # Add this debugging statement
return None```
[7] urls = ["https://www.nytimes.com/"]
extracted_content = scrape_with_playwright(urls, schema=schema) python
错误
但是我遇到了以下错误:
ConnectionError: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/chat/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f7b19911300>: Failed to establish a new connection: [Errno 111] Connection refused'))
3条答案
按热度按时间nfs0ujit1#
我发现同样的问题——我可以在Streamlit中使用Ollama,但是当我尝试通过Langchain访问Ollama时,我得到了相同的错误信息。
70gysomp2#
朋友——一个关于这个问题的有限建议——当我在我的Langchain上看到这个时,请确保你用"Ollama Serve"启动Ollama,并在端口上看到它正在监听...
C:\projects\DID\DID_LC_Ollama>ollama serve
time=2024-03-21T22:04:06.277-04:00 level=INFO source=images.go:806 msg="total blobs: 39"
time=2024-03-21T22:04:06.278-04:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0"
time=2024-03-21T22:04:06.280-04:00 level=INFO source=routes.go:1110 msg="Listening on 127.0.0.1:11434 (version 0.1.29)"
sqougxex3#
将基本URL从localhost更改为ollama serve中的内容,例如:llm = Ollama(model="llama2",base_url=" http://127.0.0.1:11434 ")