热门搜索上的ElasticSearch聚合

kxkpmulp  于 2022-09-20  发布在  ElasticSearch
关注(0)|答案(3)|浏览(209)

我有以下数据:

{"action":"CREATE","docs":1,"date":"2016 Jun 26 12:00:12","userid":"1234"}
{"action":"REPLACE","docs":2,"date":"2016 Jun 27 12:00:12","userid":"1234"}
{"action":"REPLACE","docs":1,"date":"2016 Jun 27 13:00:12","userid":"1234"}
{"action":"CREATE","docs":1,"date":"2016 Jun 28 12:00:12","userid":"3431"}
{"action":"REPLACE","docs":2,"date":"2016 Jun 28 13:00:12","userid":"3431"}
{"action":"CREATE","docs":1,"date":"2016 Jun 29 12:00:12","userid":"9999"}

为了获得每个唯一用户按日期排序(降序)的记录,我使用了如下所示的Top Hits:

"aggs": {
  "user_bucket": {
    "terms": {
      "field": "userid"
    },
    "aggs": {
      "user_latest_count": {
        "top_hits": {
          "size": 1,
          "sort": [
            {
              "data": {
                "order": "desc"
              }
            }
          ],
          "_source": {
            "include": [
              "docs"
            ]
          }
        }
      }
    }
  }
}

上述查询结果如下:

{"action":"REPLACE","docs":1,"date":"2016 Jun 27 13:00:12","userid":"1234"}
{"action":"REPLACE","docs":2,"date":"2016 Jun 28 13:00:12","userid":"3431"}
{"action":"CREATE","docs":1,"date":"2016 Jun 29 12:00:12","userid":"9999"}

现在,我想将其进一步汇总,结果如下所示:

{"sum_of_different_buckets": 4}

但不确定如何从上面得到的结果中求和字段“文档”的值。

3wabscal

3wabscal1#

您还可以在聚合中任意嵌套聚合,以从数据中提取所需的汇总数据。可能是下面的样片作品。

"aggs" : {
    "sum_of_different_buckets" : { "sum" : { "field" : "docs" } }
}
vfwfrxfs

vfwfrxfs2#

您可以在TOP_HIT的并行级别上有其他聚合,但在TOP_HIT下不能有任何**SUB_Aggregation。ElasticSearch不支持它。here is the link to github issue

但如果你想要得到相同级别的总和,你可以使用下面的方法。

"aggs": {
    "top_hits_agg": {
        "top_hits": {
            "size": 10,
            "_source": {
              "includes": ["docs"]
            }
        }
    },
    "sum_agg": {
        "sum": {
            "field": "docs"
        }
    }
}
osh3o9ms

osh3o9ms3#

您可以使用脚本度量和sum_bucket管道聚合。SCRIPTED_METRUMER聚合允许您编写自己的Map减少逻辑,因此您可以为每个术语返回一个单独的度量。

POST rahul_test/_search
{
  "size": 0,
  "aggs": {
    "user_bucket": {
      "terms": {
        "field": "userid",
        "size": 10000,
        "min_doc_count": 1
      },
      "aggs": {
        "user_latest_count": {
          "scripted_metric": {
            "init_script": "state.timestamp_latest = 0L; state.last_value = 0",
            "map_script": "def date_as_millis = doc['date'].getValue().toInstant().toEpochMilli(); if (date_as_millis > state.timestamp_latest) { state.timestamp_latest = date_as_millis; state.last_value = doc.docs.value;}",
            "combine_script": "return state",
            "reduce_script": "def last_value = 0; def timestamp_latest = 0L; for (s in states) {if (s.timestamp_latest > (timestamp_latest)) {timestamp_latest = s.timestamp_latest; last_value = s.last_value;}} return last_value;"
          }
        }
      }
    },
    "sum_user_latest_counts": {
      "sum_bucket": {
        "buckets_path": "user_bucket>user_latest_count.value"
      }
    }
  }
}
  • init_script在状态对象中创建两个字段timestamp_latestlast_value(每个分片一个状态对象)。
  • map_script针对父terms聚合返回的存储桶中收集的每个文档执行一次。如果根据文档的date定义date_as_millis,则将date_as_millisstate.timestamp_latest进行比较,最后从碎片更新state.last_value
  • combine_script返回每个分片的状态。
  • reduce_script迭代每个分片返回的s.timestamp_latest的值,并从具有最新时间戳的文档中返回单个值(last_value)。

此时,我们拥有每个userid的最新docs值。然后,我们使用sum_bucket管道聚合,以便对所有最新的docs值求和,这将返回4的值。

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 6,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "user_bucket" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1234",
          "doc_count" : 3,
          "user_latest_count" : {
            "value" : 1
          }
        },
        {
          "key" : "3431",
          "doc_count" : 2,
          "user_latest_count" : {
            "value" : 2
          }
        },
        {
          "key" : "9999",
          "doc_count" : 1,
          "user_latest_count" : {
            "value" : 1
          }
        }
      ]
    },
    "sum_user_latest_counts" : {
      "value" : 4.0
    }
  }
}

相关问题