Elasticsearch -是否可以先折叠嵌套字段,然后再聚合嵌套字段的数据?

bsxbgnwa  于 2023-02-18  发布在  ElasticSearch
关注(0)|答案(1)|浏览(219)

我正在使用Elasticsearch,我想按特定字段对搜索结果进行分组,每组返回前n个文档。该文档有一个嵌套字段,我想为每组聚合所有文档的嵌套字段。

示例

我有5个文档,每个文档都有一个groupId和一个嵌套字段people。我想按groupId对这些文档进行分组。然后,对于每个组,我想获得前2名的人(有些文档可能包含相同的人)。

PUT test/_mapping
{
  "properties": {
      "groupId":{
        "type":"keyword"
      },
      "id":{
        "type":"keyword"
      },
      "name":{
        "type":"text"
      },
      "people":{
        "type":"nested",
        "properties":{
          "email":{
            "type":"keyword"
          }
        }
      }
    }
}

PUT test/_doc/1
{
  "name": "docs1",
  "groupId": "1",
  "people":[{
    "email":"people1@test.com"
  }]
}

PUT test/_doc/2
{
  "name": "docs2",
  "groupId": "1",
  "people":[{
    "email":"people2.1@test.com"
  },
  {
    "email":"people2.2@test.com"
  }]
}

PUT test/_doc/3
{
  "name": "docs3",
  "groupId": "2",
  "people":[{
    "email":"people3.1@test.com"
  },
  {
    "email":"people2.2@test.com"
  }]
}

PUT test/_doc/4
{
  "name": "docs4",
  "groupId": "1",
  "people":[{
    "email":"people4.1@test.com"
  },
  {
    "email":"people4.2@test.com"
  }]
}

PUT test/_doc/5
{
  "name": "docs5",
  "groupId": "3",
  "people":[{
    "email":"people5.1@test.com"
  },
  {
    "email":"people5.2@test.com"
  }]
}

检索查询

GET test/_search
{
  "collapse": {
    "field": "groupId",
    "inner_hits": {
      "name":"inner",
      "size": 2
    }
  },
  "sort": [
    {
      "groupId": {
        "order": "asc"
      }
    }
  ],
  "size": 2,
  "from": 0
}

结果

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 5,
      "relation": "eq"
    },
    "max_score": null,
    "hits": [
      {
        "_index": "test",
        "_id": "1",
        "_score": null,
        "_source": {
          "name": "docs1",
          "groupId": "1",
          "people": [
            {
              "email": "people1@test.com"
            }
          ]
        },
        "fields": {
          "groupId": [
            "1"
          ]
        },
        "sort": [
          "1"
        ],
        "inner_hits": {
          "inner": {
            "hits": {
              "total": {
                "value": 3,
                "relation": "eq"
              },
              "max_score": 0,
              "hits": [
                {
                  "_index": "test",
                  "_id": "1",
                  "_score": 0,
                  "_source": {
                    "name": "docs1",
                    "groupId": "1",
                    "people": [
                      {
                        "email": "people1@test.com"
                      }
                    ]
                  }
                },
                {
                  "_index": "test",
                  "_id": "2",
                  "_score": 0,
                  "_source": {
                    "name": "docs2",
                    "groupId": "1",
                    "people": [
                      {
                        "email": "people2.1@test.com"
                      },
                      {
                        "email": "people2.2@test.com"
                      }
                    ]
                  }
                }
              ]
            }
          }
        }
      },
      {
        "_index": "test",
        "_id": "3",
        "_score": null,
        "_source": {
          "name": "docs3",
          "groupId": "2",
          "people": [
            {
              "email": "people3.1@test.com"
            },
            {
              "email": "people2.2@test.com"
            }
          ]
        },
        "fields": {
          "groupId": [
            "2"
          ]
        },
        "sort": [
          "2"
        ],
        "inner_hits": {
          "inner": {
            "hits": {
              "total": {
                "value": 1,
                "relation": "eq"
              },
              "max_score": 0,
              "hits": [
                {
                  "_index": "test",
                  "_id": "3",
                  "_score": 0,
                  "_source": {
                    "name": "docs3",
                    "groupId": "2",
                    "people": [
                      {
                        "email": "people3.1@test.com"
                      },
                      {
                        "email": "people2.2@test.com"
                      }
                    ]
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

期望为每个组聚合一个groupPeople字段,并且它包含该组的前n个人(不应受inner_hit大小的影响,例如对于groupId=1,它包含3个文档和5个人)。

wa7juj8i

wa7juj8i1#

您要查找的查询如下所示:

POST test/_search 
{
  "size": 0,
  "aggs": {
    "groups": {
      "terms": {
        "field": "groupId",
        "size": 10
      },
      "aggs": {
        "people": {
          "nested": {
            "path": "people"
          },
          "aggs": {
            "emails": {
              "terms": {
                "field": "people.email",
                "size": 2
              }
            }
          }
        }
      }
    }
  }
}

如果需要分页,可以使用composite aggregation实现相同的功能:

POST test/_search 
{
  "size": 0,
  "aggs": {
    "pages": {
      "composite": {
        "sources": [
          {
            "groups": {
              "terms": {
                "field": "groupId"
              }
            }
          }
        ]
      },
      "aggs": {
        "people": {
          "nested": {
            "path": "people"
          },
          "aggs": {
            "emails": {
              "terms": {
                "field": "people.email",
                "size": 2
              }
            }
          }
        }
      }
    }
  }
}

相关问题