你好
感谢这个项目,我在使用OpenSearchEmbeddingStore时发现了一些问题,它不支持从Python导入的数据。
使用场景:
- 我使用这段代码从OpenSearch VectorDB中导入和搜索数据。
FORMED_URL = f"https://{OPENSEARCH_USER}:{OPENSEARCH_PASSWORD}@{OPENSEARCH_URL}"
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
print("connecting to db")
docsearch = OpenSearchVectorSearch(index_name = "poc-v1",
embedding_function = embeddings,
opensearch_url = FORMED_URL)
# query = "tell me about maha raksha supreme"
query = "What is the football"
docs = docsearch.similarity_search(query, k=10, search_type = "approximate_search", space_type = "cosinesimil")
print(docs)
在opensearch中的索引模式。
GET /poc-v1
{
“poc-v1: {
"aliases": {},
"mappings": {
"properties": {
"metadata": {
"type": "object"
},
"text": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"vector": {
"type": "float"
},
"vector_field": {
"type": "knn_vector",
"dimension": 1536,
"method": {
"engine": "nmslib",
"space_type": "l2",
"name": "hnsw",
"parameters": {
"ef_construction": 512,
"m": 16
}
}
}
}
},
"settings": {
"index": {
"number_of_shards": "5",
"knn.algo_param": {
"ef_search": "512"
},
"provided_name": "tata-aia-index-pdfs-test1",
"knn": "true",
"creation_date": "1690308693903",
"number_of_replicas": "1",
"uuid": "-ppyrXrsR8SnQiPxQq7fAQ",
"version": {
"created": "136267827"
}
}
}
}
}
然而,在这段代码中,OpenSearchEmbeddingStore.java (
langchain4j/langchain4j-opensearch/src/main/java/dev/langchain4j/store/embedding/opensearch/OpenSearchEmbeddingStore.java
第58行 d6b5a79
| | public class OpenSearchEmbeddingStore implements EmbeddingStore { |
)查询字段似乎已经固定了。
return ScriptScoreQuery.of(q -> q.minScore(minScore)
.query(Query.of(qu -> qu.matchAll(m -> m)))
.script(s -> s.inline(InlineScript.of(i -> i
.source("knn_score")
.lang("knn")
.params("field", JsonData.of("vector")) <—FIXED VALUES
.params("query_value", JsonData.of(vector))
.params("space_type", JsonData.of("cosinesimil")))))
.boost(0.5f));
```
-> I suggested that we should support to customise this field ( eg in Python.
)
```
docs = db.similarity_search(query,
k=1,
vector_field='my_vector1',
search_type = "approximate_search",
space_type = "cosinesimil")
请问您能考虑一下这个问题吗?
1条答案
按热度按时间bweufnob1#
嘿,@phuongdo当然可以,请随意打开一个PR,更改应该很简单!