Elasticsearch -是否可以先折叠嵌套字段，然后再聚合嵌套字段的数据？

bsxbgnwa 于 2023-02-18 发布在 ElasticSearch

关注(0)|答案(1)|浏览(219)

我正在使用Elasticsearch，我想按特定字段对搜索结果进行分组，每组返回前n个文档。该文档有一个嵌套字段，我想为每组聚合所有文档的嵌套字段。

示例

我有5个文档，每个文档都有一个groupId和一个嵌套字段people。我想按groupId对这些文档进行分组。然后，对于每个组，我想获得前2名的人（有些文档可能包含相同的人）。

PUT test/_mapping
{
  "properties": {
      "groupId":{
        "type":"keyword"
      },
      "id":{
        "type":"keyword"
      },
      "name":{
        "type":"text"
      },
      "people":{
        "type":"nested",
        "properties":{
          "email":{
            "type":"keyword"
          }
        }
      }
    }
}

PUT test/_doc/1
{
  "name": "docs1",
  "groupId": "1",
  "people":[{
    "email":"people1@test.com"
  }]
}

PUT test/_doc/2
{
  "name": "docs2",
  "groupId": "1",
  "people":[{
    "email":"people2.1@test.com"
  },
  {
    "email":"people2.2@test.com"
  }]
}

PUT test/_doc/3
{
  "name": "docs3",
  "groupId": "2",
  "people":[{
    "email":"people3.1@test.com"
  },
  {
    "email":"people2.2@test.com"
  }]
}

PUT test/_doc/4
{
  "name": "docs4",
  "groupId": "1",
  "people":[{
    "email":"people4.1@test.com"
  },
  {
    "email":"people4.2@test.com"
  }]
}

PUT test/_doc/5
{
  "name": "docs5",
  "groupId": "3",
  "people":[{
    "email":"people5.1@test.com"
  },
  {
    "email":"people5.2@test.com"
  }]
}

检索查询

GET test/_search
{
  "collapse": {
    "field": "groupId",
    "inner_hits": {
      "name":"inner",
      "size": 2
    }
  },
  "sort": [
    {
      "groupId": {
        "order": "asc"
      }
    }
  ],
  "size": 2,
  "from": 0
}

结果

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 5,
      "relation": "eq"
    },
    "max_score": null,
    "hits": [
      {
        "_index": "test",
        "_id": "1",
        "_score": null,
        "_source": {
          "name": "docs1",
          "groupId": "1",
          "people": [
            {
              "email": "people1@test.com"
            }
          ]
        },
        "fields": {
          "groupId": [
            "1"
          ]
        },
        "sort": [
          "1"
        ],
        "inner_hits": {
          "inner": {
            "hits": {
              "total": {
                "value": 3,
                "relation": "eq"
              },
              "max_score": 0,
              "hits": [
                {
                  "_index": "test",
                  "_id": "1",
                  "_score": 0,
                  "_source": {
                    "name": "docs1",
                    "groupId": "1",
                    "people": [
                      {
                        "email": "people1@test.com"
                      }
                    ]
                  }
                },
                {
                  "_index": "test",
                  "_id": "2",
                  "_score": 0,
                  "_source": {
                    "name": "docs2",
                    "groupId": "1",
                    "people": [
                      {
                        "email": "people2.1@test.com"
                      },
                      {
                        "email": "people2.2@test.com"
                      }
                    ]
                  }
                }
              ]
            }
          }
        }
      },
      {
        "_index": "test",
        "_id": "3",
        "_score": null,
        "_source": {
          "name": "docs3",
          "groupId": "2",
          "people": [
            {
              "email": "people3.1@test.com"
            },
            {
              "email": "people2.2@test.com"
            }
          ]
        },
        "fields": {
          "groupId": [
            "2"
          ]
        },
        "sort": [
          "2"
        ],
        "inner_hits": {
          "inner": {
            "hits": {
              "total": {
                "value": 1,
                "relation": "eq"
              },
              "max_score": 0,
              "hits": [
                {
                  "_index": "test",
                  "_id": "3",
                  "_score": 0,
                  "_source": {
                    "name": "docs3",
                    "groupId": "2",
                    "people": [
                      {
                        "email": "people3.1@test.com"
                      },
                      {
                        "email": "people2.2@test.com"
                      }
                    ]
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

期望为每个组聚合一个groupPeople字段，并且它包含该组的前n个人（不应受inner_hit大小的影响，例如对于groupId=1，它包含3个文档和5个人）。

elasticsearch

来源：https://stackoverflow.com/questions/75458546/elasticsearch-is-it-possible-to-collapse-first-then-aggregate-data-of-a-nested

1条答案

按热度按时间

wa7juj8i1#

您要查找的查询如下所示：

POST test/_search 
{
  "size": 0,
  "aggs": {
    "groups": {
      "terms": {
        "field": "groupId",
        "size": 10
      },
      "aggs": {
        "people": {
          "nested": {
            "path": "people"
          },
          "aggs": {
            "emails": {
              "terms": {
                "field": "people.email",
                "size": 2
              }
            }
          }
        }
      }
    }
  }
}

如果需要分页，可以使用composite aggregation实现相同的功能：

POST test/_search 
{
  "size": 0,
  "aggs": {
    "pages": {
      "composite": {
        "sources": [
          {
            "groups": {
              "terms": {
                "field": "groupId"
              }
            }
          }
        ]
      },
      "aggs": {
        "people": {
          "nested": {
            "path": "people"
          },
          "aggs": {
            "emails": {
              "terms": {
                "field": "people.email",
                "size": 2
              }
            }
          }
        }
      }
    }
  }
}

赞(0）回复(0）举报 2023-02-18

我来回答

Elasticsearch -是否可以先折叠嵌套字段，然后再聚合嵌套字段的数据？

示例

1条答案

相关问题

热门标签

最新问答