如何在Elasticsearch Bucket Aggregation查询中获取单据值而不是单据计数

k0pti3hp  于 2023-06-29  发布在  ElasticSearch
关注(0)|答案(2)|浏览(113)

我有四个文件在我的索引。

{
            "_index": "my-index",
            "_type": "_doc",
            "_id": "1",
            "_score": 1.0,
            "_source": {
                "@timestamp": "2099-11-15T13:12:00",
                "message": "INFO GET /search HTTP/1.1 200 1070000",
                "user": {
                    "id": "test@gmail.com"
                }
            }
        },
        {
            "_index": "my-index",
            "_type": "_doc",
            "_id": "2",
            "_score": 1.0,
            "_source": {
                "@timestamp": "2099-11-15T13:15:00",
                "message": "Error GET /search HTTP/1.1 200 1070000",
                "user": {
                    "id": "test@gmail.com"
                }
            }
        },
       {
            "_index": "my-index",
            "_type": "_doc",
            "_id": "3",
            "_score": 1.0,
            "_source": {
                "@timestamp": "2099-11-15T13:20:00",
                "message": "INFO GET /parse HTTP/1.1 200 1070000",
                "user": {
                    "id": "test@gmail.com"
                }
            }
        },
        {
            "_index": "my-index",
            "_type": "_doc",
            "_id": "4",
            "_score": 1.0,
            "_source": {
                "@timestamp": "2099-11-15T13:26:00",
                "message": "Error GET /parse HTTP/1.1 200 1070000",
                "user": {
                    "id": "test@gmail.com"
                }
            }
        }

我写桶聚合查询使用过滤器分组的所有文件索引的消息类型(信息或错误)。在我上面的例子中,索引中有4个文档,其中两个具有类型为“info”的消息,另外两个具有类型为“error”的消息。
我想写bucket聚合查询,这样我就可以按消息类型得到结果组。预期结果应为两个桶,每个桶有两个文件。但是我的查询只返回每个桶的单据计数,而不是实际的单据值。

我使用的查询是:

{
   "size":0,
   "aggs" : {
     "messages" : {
       "filters" : {
          "filters" : {
             "info" :   { "match" : { "message" : "Info"   }},
             "error" : { "match" : { "message" : "Error"   }}
          }
        }
     }
  }
}

上述查询的输出为:

{
"took": 3,
"timed_out": false,
"_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
},
"hits": {
    "total": {
        "value": 2,
        "relation": "eq"
    },
    "max_score": null,
    "hits": []
},
"aggregations": {
    "messages": {
        "buckets": {
            "errors": {
                "doc_count": 2
            },
            "info": {
                "doc_count": 2
            }
        }
    }
}
   }

但我的要求是获得桶组中具有字段值的实际文档。有没有什么方法可以改变桶聚合查询的过滤器,这样我就可以得到每个桶都有值的单据?

qv7cva1a

qv7cva1a1#

您可以使用top_hits aggregation,来获取存储桶组内的相应文档

{
  "size": 0,
  "aggs": {
    "messages": {
      "filters": {
        "filters": {
          "info": {
            "match": {
              "message": "Info"
            }
          },
          "error": {
            "match": {
              "message": "Error"
            }
          }
        }
      },
      "aggs": {
        "top_filters_hits": {
          "top_hits": {
            "_source": {
              "includes": [
                "message",
                "user.id"
              ]
            }
          }
        }
      }
    }
  }
}

搜索结果将是

"aggregations": {
    "messages": {
      "buckets": {
        "error": {
          "doc_count": 2,
          "top_filters_hits": {
            "hits": {
              "total": {
                "value": 2,
                "relation": "eq"
              },
              "max_score": 1.0,
              "hits": [
                {
                  "_index": "67033379",
                  "_type": "_doc",
                  "_id": "2",
                  "_score": 1.0,
                  "_source": {
                    "message": "Error GET /search HTTP/1.1 200 1070000",
                    "user": {
                      "id": "test@gmail.com"
                    }
                  }
                },
                {
                  "_index": "67033379",
                  "_type": "_doc",
                  "_id": "4",
                  "_score": 1.0,
                  "_source": {
                    "message": "Error GET /parse HTTP/1.1 200 1070000",
                    "user": {
                      "id": "test@gmail.com"
                    }
                  }
                }
              ]
            }
          }
        },
        "info": {
          "doc_count": 2,
          "top_filters_hits": {
            "hits": {
              "total": {
                "value": 2,
                "relation": "eq"
              },
              "max_score": 1.0,
              "hits": [
                {
                  "_index": "67033379",
                  "_type": "_doc",
                  "_id": "1",
                  "_score": 1.0,
                  "_source": {
                    "message": "INFO GET /search HTTP/1.1 200 1070000",
                    "user": {
                      "id": "test@gmail.com"
                    }
                  }
                },
                {
                  "_index": "67033379",
                  "_type": "_doc",
                  "_id": "3",
                  "_score": 1.0,
                  "_source": {
                    "message": "INFO GET /parse HTTP/1.1 200 1070000",
                    "user": {
                      "id": "test@gmail.com"
                    }
                  }
                }
              ]
            }
          }
        }
      }
    }
  }
eni9jsuy

eni9jsuy2#

要按字段对文档进行分类,我们可以使用'term aggs'。
但是如果你想看到类别中的每个文档,我们必须在其中创建另一个子聚合,通过一个唯一的字段使用“term aggs”。显然按'_id'分组。
由于前面的aggs带有'_id',因此每个bucket只能得到一个文档。
然后使用'top hits' aggs提取实际的文档数据。大小始终为1。

{

 // 1) Categories the data by a field

  "aggs": {
    "FIELD-LEVEL-AGGs-1": {
      "terms": {
        "field": "field-category"
      },

 // 2) Use '_id' level aggs if you want all data otherwise use top_hit directly
 //            Basically we are flattening the aggs 1

      "aggs": {
        "DOC-LEVEL-AGGs-2": {
          "terms": {
            "field": "_id"
          }
        },

 // 3) Get the actual doc data with top_hit

        "aggs": {
          "DOC-DATA-AGGs-3": {
            "top_hits": {
              "size": 1,
              "_source": {
                "include": [
                  "field-category",
                  "field-name"
                ]
              }
            }
          }
        }
      }
    }
  }
}

相关问题