elasticsearch 删除重复项和排序(aggs + sort)

dsf9zpds  于 2023-10-17  发布在  ElasticSearch
关注(0)|答案(1)|浏览(168)

我试图找到一个最好的解决方案,其中查询返回一个排序的集合,然后我使用aggs来删除重复,这很好,但是当我在查询结果上添加排序时,例如。

"query": {..},
"sort": {.. "body.make": "asc" ..}

我希望aggs也能按此顺序返回结果,但它似乎总是按查询分数排序。

// Here I'm collecting all body.vin values to remove duplicates 
  // and then returning only the first in each result set.
  "aggs": {
    "dedup": {
      "terms": {
        "size": 8,
        "field": "body.vin"
      },
      "aggs": {
        "dedup_docs": {
          "top_hits": {
            "size": 1,
            "_source": false
          }
        }
      }
    }
  },

我试着在两者之间放一个术语聚合,看看这是否会排序:

// here again same thing, however I attempt to sort on body.make
// in the document, however I now realize that my bucket result
// being each a collection of the duplicates, will sort each duplicate
// and not on the last results.
  "aggs": {
    "dedup": {
      "terms": {
        "size": 8,
        "field": "body.vin"
      },
      "aggs": {
        "order": {
          "terms": {
            "field": "body.make",
            "order": {
              "_term": "asc"
            }
          },
          "aggs": {
            "dedup_docs": {
              "top_hits": {
                "size": 1,
                "_source": false
              }
            }
          }
        }
      }
    }
  },

但是汇总的结果总是基于分数。
我还尝试过基于查询排序调整分数的想法或解决方案,这样聚合将返回正确的顺序,因为它基于分数返回,但似乎没有任何办法用sort: {}来做这件事。
如果有人在排序结果中取得了成功,同时删除重复,或想法/建议,请让我知道。

brccelvz

brccelvz1#

这不是最理想的解决方案,因为它只允许对一个字段进行排序。最好的方法是改变排序结果的分数/提升

试图解释它使我意识到一旦我掌握了桶的概念,或者更重要的是如何传递它们,这是如何做到的。我仍然对sort + score adjust解决方案感兴趣,但通过aggregates,它可以工作:

// here we first aggregate all body.make, so first results might
// {"toyota": {body.vin 123}, "toyota": {body.vin 123}...} and the
// next result passed into the dedup aggregate would be say
// {"nissan"...
  "aggs": {
    "sort": {
      "terms": {
        "size": 8,
        "field": "body.make",
        "order": {
          "_term": "desc"
        }
      },
      "aggs": {
        "dedup": {
          "terms": {
            "size": 8,
            "field": "body.vin"
          },
          "aggs": {
            "dedup_docs": {
              "top_hits": {
                "size": 1,
                "_source": false
              }
            }
          }
        }
      }
    }
  },

相关问题