ElasticSearch中状态相同的最新记录的聚合

jfewjypa  于 2022-12-29  发布在  ElasticSearch
关注(0)|答案(1)|浏览(112)

我在ElasticSearch索引some_index中有以下数据。

[ {
      "_index": "some_index",
        "_source": {
          "cart": {
            "cart_id": 1,
            "cart_status": "new",
            "grandTotal": 12,
            "event": "some_event",
            "timestamp": "2022-12-01T00:00:00.000Z"
          }
        }
      },
      {
        "_index": "some_index",
      "_source": {
        "cart": {
          "cart_id": 1,
          "cart_status": "paid",
          "grandTotal": 12,
          "event": "some_event",
          "timestamp": "2022-12-02T00:00:00.000Z"
        }
      }
    },
    {
      "_index": "some_index",
    "_source": {
      "cart": {
        "cart_id": 2,
        "cart_status": "new",
        "grandTotal": 23,
        "event": "some_event",
        "timestamp": "2022-12-01T00:00:00.000Z"
      }
    }
  },
  {
  "_index": "some_index",
  "_source": {
    "cart": {
      "cart_id": 2,
      "cart_status": "paid",
      "grandTotal": 23,
      "event": "some_event",
      "timestamp": "2022-12-04T00:00:00.000Z"
    }
  }
},
{
  "_index": "some_index",
"_source": {
  "cart": {
    "cart_id": 3,
    "cart_status": "new",
    "grandTotal": 17,
    "event": "some_event",
    "timestamp": "2022-12-01T00:00:00.000Z"
  }
}
},
{
  "_index": "some_index",
"_source": {
  "cart": {
    "cart_id": 3,
    "cart_status": "new",
    "grandTotal": 17,
    "event": "some_event",
    "timestamp": "2022-12-04T00:00:00.000Z"
  }
}
}
]

我想得到的是grandTotals与给定时间范围内每个购物车的最新cart_status之和。
根据上面的示例,timestamp >= 2022-12-01 00:00:00 and timestamp<= 2022-12-03 00:00:00的结果应该类似于
cart_status:new, sum grandTotal: 40,因为在该时间范围内,最新状态new具有cart_id3和2。
cart_status:paid, sum grandTotal: 12以及这个,因为已支付是仅cart_id=1的最新状态。
我尝试使用sub-aggregation on top_result, top_hits,但ElasticSearch抱怨"Aggregator [top_result] of type [top_hits] cannot accept sub-aggregations"
此外,我还尝试使用collapse来获取最新的状态,但根据文档,也不可能对collapse的结果进行聚合。
有人能帮我解决这个问题吗,这似乎是一个常见的计算,但在ElasticSearch中不是很微不足道。
在SQL中,使用窗口函数很容易做到这一点。
我想避免将中间数据持久化到另一个索引中。因为我需要动态查询,因为用户可能希望获得任何时间范围的计算结果。

2uluyalo

2uluyalo1#

你可以尝试这下面的方法. whitember,为card_status,sum值将是52因为它包括card_id 1它有"新"作为卡状态连同2和3为给定的时间戳.

    • Map:**
PUT some_index
{
  "mappings" : {
    "properties": {
      "timestamp" : {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss||strict_date_optional_time ||epoch_millis"
      },
      "cart_id" : {
        "type": "keyword"
      },
      "cart_status" : {
        "type": "keyword"
      },
      "grand_total" : {
        "type": "long"
      },
      "event":{
        "type": "keyword"
      }
    }
  }
}
    • 批量插入:**
POST _bulk
{ "index" : { "_index" : "some_index", "_id" : "1" } }
{ "cart_id" : "1" , "grand_total":12, "cart_status" : "new","timestamp":"2022-12-01T00:00:00.000Z", "event" : "some_event"}
{ "index" : { "_index" : "some_index", "_id" : "2" } }
{ "cart_id" : "1" , "grand_total":12, "cart_status" : "paid","timestamp":"2022-12-02T00:00:00.000Z", "event" : "some_event"}
{ "index" : { "_index" : "some_index", "_id" : "3" } }
{ "cart_id" : "2" , "grand_total":23, "cart_status" : "new","timestamp":"2022-12-01T00:00:00.000Z", "event" : "some_event"}
{ "index" : { "_index" : "some_index", "_id" : "4" } }
{ "cart_id" : "2" , "grand_total":23, "cart_status" : "paid","timestamp":"2022-12-04T00:00:00.000Z", "event" : "some_event"}
{ "index" : { "_index" : "some_index", "_id" : "5" } }
{ "cart_id" : "3" , "grand_total":17, "cart_status" : "new","timestamp":"2022-12-01T00:00:00.000Z", "event" : "some_event"}
{ "index" : { "_index" : "some_index", "_id" : "6" } }
{ "cart_id" : "3" , "grand_total":17, "cart_status" : "new","timestamp":"2022-12-04T00:00:00.000Z", "event" : "some_event"}
    • 查询:**
GET some_index/_search
{
    "size":0, 
    "query": {
        "bool": {
            "filter": [
                {
                    "range": {
                        "timestamp": {
                            "gte": "2022-12-01 00:00:00",
                            "lte": "2022-12-03 00:00:00"
                        }
                    }
                }
            ]
        }
    },
    "aggs": {
    "card_status": {
      "terms": {
        "field": "cart_status"
      },
      "aggs": {
        "grandTotal": {
          "sum": {
            "field": "grand_total"
          }
        }
      }
    }
  }
}
    • 输出:**
{
  "took": 86,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "card_status": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "new",
          "doc_count": 3,
          "grandTotal": {
            "value": 52
          }
        },
        {
          "key": "paid",
          "doc_count": 1,
          "grandTotal": {
            "value": 12
          }
        }
      ]
    }
  }
}

相关问题