这是我第一次使用ElasticSearch和Python客户端。我对如何设置query_body来查询密集向量字段有点困惑。以下是我目前为止所做的步骤。请帮助我创建可以在我的搜索函数中使用的查询主体。
from elasticsearch import Elasticsearch, helpers
from sentence_transformers import SentenceTransformer, util
embedder = SentenceTransformer('bert-base-nli-stsb-mean-tokens')
with open('my_folder/my_docs.json', 'r') as file:
documents = json.load(file)
#STEP 1: Embedding documents
for d in documents:
d['vector']= embedder.encode(d['content'], convert_to_tensor=True)
d['vector'] = d['vector'].numpy()
#STEP 2: Defining Mapping Dictionary
mapping = {
"mappings": {
"properties": {
"name": {
"type": "text"
},
"content": {
"type": "text"
},
"doc_vector": {
"type": "dense_vector",
"dims": 768
}
}
}
}
#STEP 3: Creating the Client
client = Elasticsearch("http://localhost:9200")
# STEP 4: Creating Index
response = client.indices.create(
index="my_doc_dense_index",
body=mapping,
ignore=400 # ignore 400 already exists code
)
# STEP 5: Bulk Uploading docs to Index
resp = helpers.bulk(
client,
documents,
index = 'my_doc_dense_index')
#STEP 6: Example Query
query = 'Who is the tennis champion in women''s tennis?'
#STEP 7: Encoding Query
encoded_query = embedder.encode([query])
#STEP 8: Setting up query body with encoded query
query_body = ???????
#STEP 9: submit a search query to ElasticSearch
docs = client.search(body = query_body, index="my_doc_dense_index", size=10)
从第1步到第7步的所有代码都运行良好。我需要帮助构建第8步的密集矢量查询,以便我可以在第9步使用它。一些身体可以帮助。
提前谢谢你,凯
1条答案
按热度按时间taor4pac1#
您可以使用
knn
选项,在search
或knn_search
方法中传递具有dense_vector
的查询对象。有关搜索和knn_search,请参见ES官方文档