条件的ElasticSearch聚合筛选器数组

vbopmzt1 于 2023-01-01 发布在 ElasticSearch

关注(0)|答案(2)|浏览(141)

我的数据如下所示：

[
    {
        "name": "Scott",
        "origin": "London",
        "travel": [
            {
                "active": false,
                "city": "Berlin",
                "visited": "2020-02-01"
            },
            {
                "active": true,
                "city": "Prague",
                "visited": "2020-02-15"
            }
        ]
    },
    {
        "name": "Lilly",
        "origin": "London",
        "travel": [
            {
                "active": true,
                "city": "Scotland",
                "visited": "2020-02-01"
            }
        ]
    }
]

我想执行一个聚合，其中每个顶级起点都是一个bucket，然后执行一个嵌套聚合，以查看当前访问每个城市的人数，因此我只关心 * 如果 * active是true，则城市是什么。
使用一个过滤器，它将搜索visited数组，并返回完整的数组（两个对象），如果其中一个将active设置为true，我不想包括active为false的城市。
预期输出：

{
  "aggregations": {
    "origin": {
      "buckets": [
        {
          "key": "London",
          "buckets": [
            {
              "key": "travel",
              "doc_count": 2555,
              "buckets": [
                {
                  "key": "Scotland",
                  "doc_count": 1
                },
                {
                  "key": "Prague",
                  "doc_count": 1
                }
              ]
            }
          ]
        }
      ]
    }
  }
}

在上面，我只有2个travel聚合下的计数，因为只有两个travel对象的active设置为true。
目前，我的聚合设置如下：

{
  "from": 0,
  "aggs": {
    "origin": {
      "terms": {
        "field": "origin"
      },
      "aggs": {
        "travel": {
          "filter": {
            "term": {
              "travel.active": true
            }
          },
          "aggs": {
            "city": {
              "terms": {
                "field": "city"
              }
            }
          }
        }
      }
    }
  }
}

我在origin上有一个顶级聚合，然后在travel数组上有一个嵌套的聚合，这里我在travel.active = true上有一个过滤器，然后有另一个嵌套的聚合为每个城市创建bucket。
在我的聚合中，它仍然产生Berlin作为一个城市，即使我过滤了active = true。
我的猜测是因为它允许它，因为active: true对于数组中的一个对象为真。
如何从聚合中完全过滤掉active: false？

elasticsearch

来源：https://stackoverflow.com/questions/74961300/elastic-search-aggregation-filter-array-for-condition

2条答案

按热度按时间

3df52oht1#

您必须使用**“嵌套聚合"。**参考的官方文档链接
以下是查询的示例：

Map：

PUT /city_index
{
  "mappings": {
    "properties": {
      "name" : { "type" : "keyword" },
      "origin" : { "type" : "keyword" },
      "travel": { 
        "type": "nested",
        "properties": {
          "active": {
            "type": "boolean"
          },
          "city": {
            "type": "keyword"
          },
          "visited" : {
            "type":"date"
          }
        }
      }
    }
  }
}

插入：

PUT /city_index/_doc/1
{
  "name": "Scott", 
  "origin" : "London",
  "travel": [
    {
      "active": false,
      "city": "Berlin",
      "visited" : "2020-02-01"
    },
    {
      "active": true,
      "city": "Prague",
      "visited": "2020-02-15"
    }
  ]
}

PUT /city_index/_doc/2
{
  "name": "Lilly",
  "origin": "London",
  "travel": [
    {
      "active": true,
      "city": "Scotland",
      "visited": "2020-02-01"
    }
  ]
}

查询：

GET /city_index/_search
{
  "size": 0,
  "aggs": {
    "origin": {
      "terms": {
        "field": "origin"
      },
      "aggs": {
        "city": {
          "nested": {
            "path": "travel"
          },
          "aggs": {
            "travel": {
              "filter": {
                "term": {
                  "travel.active": true
                }
              },
              "aggs": {
                "city": {
                  "terms": {
                    "field": "travel.city"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

输出：

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "origin": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "London",
          "doc_count": 2,
          "city": {
            "doc_count": 3,
            "travel": {
              "doc_count": 2,
              "city": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                  {
                    "key": "Prague",
                    "doc_count": 1
                  },
                  {
                    "key": "Scotland",
                    "doc_count": 1
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

赞(0）回复(0）举报 2023-01-01

4dbbbstv2#

@karthick的建议很好，但是我在查询中添加了过滤器，这样你在聚合阶段的值会更少。

GET idx_travel/_search
{
  "size": 0,
  "query": {
    "nested": {
      "path": "travel",
      "query": {
        "term": {
          "travel.active": {
            "value": true
          }
        }
      }
    }
  },
  "aggs": {
    "origin": {
      "terms": {
        "field": "origin"
      },
      "aggs": {
        "city": {
          "nested": {
            "path": "travel"
          },
          "aggs": {
            "city": {
              "terms": {
                "field": "travel.city"
              }
            }
          }
        }
      }
    }
  }
}

赞(0）回复(0）举报 2023-01-01

我来回答

条件的ElasticSearch聚合筛选器数组

2条答案

相关问题

热门标签

最新问答