langchain4j [特性]OpenSearchEmbeddingStore支持从Python导入的数据

hpxqektj  于 3个月前  发布在  Python
关注(0)|答案(1)|浏览(114)

你好

感谢这个项目,我在使用OpenSearchEmbeddingStore时发现了一些问题,它不支持从Python导入的数据。

使用场景:

  • 我使用这段代码从OpenSearch VectorDB中导入和搜索数据。
FORMED_URL = f"https://{OPENSEARCH_USER}:{OPENSEARCH_PASSWORD}@{OPENSEARCH_URL}"

embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
print("connecting to db")
docsearch = OpenSearchVectorSearch(index_name = "poc-v1", 
                                   embedding_function = embeddings, 
                                   opensearch_url = FORMED_URL)
# query = "tell me about maha raksha supreme"
query = "What is the football"
docs = docsearch.similarity_search(query, k=10, search_type = "approximate_search", space_type = "cosinesimil")

print(docs)

在opensearch中的索引模式。

GET /poc-v1

{
  “poc-v1: {
    "aliases": {},
    "mappings": {
      "properties": {
        "metadata": {
          "type": "object"
        },
        "text": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "vector": {
          "type": "float"
        },
        "vector_field": {
          "type": "knn_vector",
          "dimension": 1536,
          "method": {
            "engine": "nmslib",
            "space_type": "l2",
            "name": "hnsw",
            "parameters": {
              "ef_construction": 512,
              "m": 16
            }
          }
        }
      }
    },
    "settings": {
      "index": {
        "number_of_shards": "5",
        "knn.algo_param": {
          "ef_search": "512"
        },
        "provided_name": "tata-aia-index-pdfs-test1",
        "knn": "true",
        "creation_date": "1690308693903",
        "number_of_replicas": "1",
        "uuid": "-ppyrXrsR8SnQiPxQq7fAQ",
        "version": {
          "created": "136267827"
        }
      }
    }
  }
}




然而,在这段代码中,OpenSearchEmbeddingStore.java (
langchain4j/langchain4j-opensearch/src/main/java/dev/langchain4j/store/embedding/opensearch/OpenSearchEmbeddingStore.java
第58行 d6b5a79
| | public class OpenSearchEmbeddingStore implements EmbeddingStore { |
)查询字段似乎已经固定了。

return ScriptScoreQuery.of(q -> q.minScore(minScore)
                .query(Query.of(qu -> qu.matchAll(m -> m)))
                .script(s -> s.inline(InlineScript.of(i -> i
                        .source("knn_score")
                        .lang("knn")
                        .params("field", JsonData.of("vector"))  <—FIXED VALUES
                        .params("query_value", JsonData.of(vector))
                        .params("space_type", JsonData.of("cosinesimil")))))
                .boost(0.5f));


```


-> I suggested that we should support to customise this field ( eg in Python.
) 




```

docs = db.similarity_search(query,
                            k=1,
                            vector_field='my_vector1',
                            search_type = "approximate_search",
                            space_type = "cosinesimil")

请问您能考虑一下这个问题吗?

bweufnob

bweufnob1#

嘿,@phuongdo当然可以,请随意打开一个PR,更改应该很简单!

相关问题