ElasticSearch:必须循环执行脚本得分查询

imzjd6km  于 2022-10-06  发布在  ElasticSearch
关注(0)|答案(1)|浏览(137)

呵呵,我正在开发一个小服务来计算物品之间的相似度。我将数据摄取到ElasticSearch集群中的一个索引中,其中每个项目都由维度为e1d0d1的数值向量与id一起表示。我已经开发了一个函数,您可以在其中输入ID列表,然后ElasticSearch计算出与这些条目最相似的条目。然而,在我的查询中,我必须遍历每个文档/项以提取数值向量。有没有什么方法可以让我的工作更有效率,这样我就不必在Python中的循环中查询ElasticSearch了?

def query_es_for_similar_items(es: Elasticsearch, item_ids: List[int], index: str, n=100):
    query1 = {"query": {"ids": {"type": "_doc", "values": item_ids}}}
    documents = es.search(index=index, body=query1)
    d_res = dict()
    for document in documents["hits"]["hits"]:
        item_id = int(document["_id"])
        query = {
            "size": n,
            "query": {
                "script_score": {
                    "query": {"bool": {"must_not": [{"match": {"_id": item_id}}]}},
                    "script": {
                        "source": "cosineSimilarity(params.query_value, doc[params.field]) + 1",
                        "params": {
                            "field": "embeddings",
                            "query_value": document["_source"]["embeddings"],
                        },
                    },
                }
            },
        }
        resp = es.search(index=index, body=query)
        ranked_scores = [
            {
                "item_id": document["_id"],
                "similarity": document["_score"] / 2,
            }
            for document in resp["hits"]["hits"]
        ]
        d_res[item_id] = ranked_scores
    return d_res
92dk7w1h

92dk7w1h1#

您只需在脚本分数查询正文中使用filter,并在从Elasticearch获得的响应正文中检索其_score

def query_es_for_similar_items(es: Elasticsearch, item_ids: List[int], index: str, n=100):
    query = {
       "script_score": 
                "query": 
                    {"bool": {"must_not": [{"match": {"_id": item_id}}]},
                     "filter": {"ids": {"values": item_ids}},
                    "script": {
                        "source": "cosineSimilarity(params.query_value, doc[params.field]) + 1",
                        "params": {
                            "field": "embeddings",
                            "query_value": document["_source"]["embeddings"],
                        },
                    },
                }
            },
        }
    documents = es.search(index=index, body=query, size=n)
    d_res = dict()
    for document in documents["hits"]["hits"]:
        item_id = int(document["_id"])

        ranked_scores = int(document["_score"]) / 2
        d_res[item_id] = ranked_scores
    return d_res

一定要让我知道上面的解决方案是否对你有效(我正在使用Elasticearch 8.4.1)。干杯。

相关问题