如何在python客户端进行ElasticSearch中查询稠密向量场?

xxb16uws  于 2022-12-11  发布在  ElasticSearch
关注(0)|答案(1)|浏览(225)

这是我第一次使用ElasticSearch和Python客户端。我对如何设置query_body来查询密集向量字段有点困惑。以下是我目前为止所做的步骤。请帮助我创建可以在我的搜索函数中使用的查询主体。

from elasticsearch import Elasticsearch, helpers
from sentence_transformers import SentenceTransformer, util

embedder = SentenceTransformer('bert-base-nli-stsb-mean-tokens')

with open('my_folder/my_docs.json', 'r') as file:
    documents = json.load(file)

#STEP 1: Embedding documents

for d in documents:
    d['vector']= embedder.encode(d['content'], convert_to_tensor=True) 
    d['vector'] = d['vector'].numpy()

#STEP 2: Defining Mapping Dictionary

mapping = {
    "mappings": {
        "properties": {
            "name": {
                "type": "text" 
            },
            "content": {
                "type": "text"
            },
            "doc_vector": {
                "type": "dense_vector",
                "dims": 768
            }
        }
    }
}

#STEP 3: Creating the Client

client = Elasticsearch("http://localhost:9200")

# STEP 4: Creating Index

response = client.indices.create(
    index="my_doc_dense_index",
    body=mapping,
    ignore=400 # ignore 400 already exists code
)

# STEP 5: Bulk Uploading docs to Index

resp = helpers.bulk(
    client,
    documents,
    index = 'my_doc_dense_index')

#STEP 6: Example Query
query = 'Who is the tennis champion in women''s tennis?'

#STEP 7: Encoding Query
encoded_query = embedder.encode([query])

#STEP 8: Setting up query body with encoded query
query_body = ???????

#STEP 9: submit a search query to ElasticSearch

docs = client.search(body = query_body, index="my_doc_dense_index", size=10)

从第1步到第7步的所有代码都运行良好。我需要帮助构建第8步的密集矢量查询,以便我可以在第9步使用它。一些身体可以帮助。
提前谢谢你,凯

taor4pac

taor4pac1#

您可以使用knn选项,在searchknn_search方法中传递具有dense_vector的查询对象。

from elasticsearch import Elasticsearch

es = Elasticsearch()
my_vector = [0.5, 0.3, 0.2]

query_string = {
    "field": "my_dense_vector_field",
    "query_vector": my_vector,
    "k": 10,
    "num_candidates": 100
}

# run the query
results = es.search(index="my_index", knn=query_string)

有关搜索和knn_search,请参见ES官方文档

相关问题