elasticsearch Elastic-search java -嵌套字段计数和求和子聚合

7kjnsjlb  于 2023-05-22  发布在  ElasticSearch
关注(0)|答案(1)|浏览(173)

我在ElasticSearch索引中有以下结构

{
    "_index" : "hotel",
    "_type" : "_doc",
    "_id" : "13171",
    "_score" : 6.072218,
    "_source" : {
      "_class" : "hotel",
      "id" : 13171,
      "places" : [
        {
          "type" : "MAIN_LOCATION",
          "placeId" : 2032
        }
      ],
      "numberOfRecommendations" : 0
    }
  },
  {
    "_index" : "hotel",
    "_type" : "_doc",
    "_id" : "7146",
    "_score" : 6.072218,
    "_source" : {
      "_class" : "hotel",
      "id" : 7146,
      "places" : [
        {
          "type" : "MAIN_LOCATION",
          "placeId" : 2032
        }
      ],
      "numberOfRecommendations" : 1
    }  
  },
  {
    "_index" : "hotel",
    "_type" : "_doc",
    "_id" : "7146",
    "_score" : 6.072218,
    "_source" : {
      "_class" : "hotel",
      "id" : 7146,
      "places" : [
        {
          "type" : "AFFILIATE",
          "placeId" : 2032
        }
      ],
      "numberOfRecommendations" : 3
    }  
  }

请注意,地点是嵌套类型,有两个类型“主位置”和附属。我正在创建一个聚合来计算特定地点的酒店和主要位置的推荐总数。
在上面的主位置示例中,我应该得到hotels 2和numberOfRecommendations 1
我正在使用java并创建了以下代码

public List<PlaceHotelStats> getHotelOfferStats() {

// Create aggregation filter for considering only places with PlaceType from filter(in current
// case main location)
String placeFilterAggregationName = "placeFilter";
BoolQueryBuilder nestedPlaceQuery = boolQuery();
nestedPlaceQuery.must(termQuery("places.type", "MAIN_LOCATION"));
nestedPlaceQuery.must(termsQuery("places.placeId", filter.getPlaceIds()));
AggregationBuilder placeAggregationFilter =
    AggregationBuilders.filters(placeFilterAggregationName, nestedPlaceQuery);

// Add Terms filter to group by field placeId and then add sub aggregation for
// totalRecommendations to have buckets
String aggregationGroupByPlaceId = "group_by_place_id";
var includedPlaceIds = filter.getPlaceIds().stream().mapToLong(l -> l).toArray();
TermsAggregationBuilder aggregationBuilders =
    AggregationBuilders.terms(aggregationGroupByPlaceId)
        .field("places.placeId")
        .size(filter.getPlaceIds().size())
        .includeExclude(new IncludeExclude(includedPlaceIds, null))
        .subAggregation(
            AggregationBuilders.sum("totalRecommendationsForPlace")
                .field("numberOfRecommendations"));

// Add place term aggregation along with recommendation to Filter aggregation
placeAggregationFilter.subAggregation(aggregationBuilders);

// The final aggregration which has filter first then subaggregation of place terms with buckets
// and review counts
var nestedPlacesAggregation =
    AggregationBuilders.nested(NESTED_PLACES_AGGREGATION_NAME, PLACES)
        .subAggregation(placeAggregationFilter);
var query =
    new NativeSearchQueryBuilder()
        .withQuery(builder.query())
        .addAggregation(nestedPlacesAggregation)
        .build();

var result = elasticsearchOperations.search(query, EsHotel.class, ALIAS_COORDS);

if (!result.hasAggregations()) {
  throw new IllegalStateException("No aggregations found after query with aggregations!");
}

ParsedFilters aggregationParsedFilters =
    ((ParsedNested) result.getAggregations().get(NESTED_PLACES_AGGREGATION_NAME))
        .getAggregations()
        .get(placeFilterAggregationName);
var buckets =
    ((ParsedTerms)
            aggregationParsedFilters
                .getBuckets()
                .get(0)
                .getAggregations()
                .get(aggregationGroupByPlaceId))
        .getBuckets();

List<PlaceHotelStats> placeHotelStats= new ArrayList<>();
buckets.forEach(
    bucket ->
        placeHotelStats.add(
            new PlaceHotelStats(
                bucket.getKeyAsNumber().longValue(),
                Math.toIntExact(bucket.getDocCount()),
                getTotalRecommendationsForPlace(bucket))));

return placeOfferStats;

}

private int getTotalRecommendationsForPlace(Terms.Bucket bucket) {
    var aggregationTotalRecommendation =
        bucket.getAggregations().get("totalRecommendationsForPlace");
    if (aggregationTotalRecommendation != null) {
      return (int) ((ParsedSum) aggregationTotalRecommendation).getValue();
    }
    return 0;
  }

这给了我正确的总位置数,但不是所有建议的正确总和
我检查ElasticSearch查询,它看起来像这样

{
  "query": {
  "bool" : {
    "must" : [
      {
        "nested" : {
          "query" : {
            "bool" : {
              "must" : [
                {
                  "term" : {
                    "places.type" : {
                      "value" : "MAIN_LOCATION",
                      "boost" : 1.0
                    }
                  }
                },
                {
                  "terms" : {
                    "places.placeId" : [
                      7146
                    ],
                    "boost" : 1.0
                  }
                }
              ],
              "adjust_pure_negative" : true,
              "boost" : 1.0
            }
          },
          "path" : "places",
          "ignore_unmapped" : false,
          "score_mode" : "min",
          "boost" : 1.0
        }
      },
      
      
      {
        "nested" : {
          "query" : {
            "exists" : {
              "field" : "places",
              "boost" : 1.0
            }
          },
          "path" : "places",
          "ignore_unmapped" : false,
          "score_mode" : "none",
          "boost" : 1.0
        }
      }
    ],
    "adjust_pure_negative" : true,
    "boost" : 1.0
  }
},
"aggs": {
  "nestedPlaces":{
    "nested":{"path":"places"},
    "aggregations":{
      "placeFilter":{
        "filters":{
          "filters":[{
            "bool":{
              "must":[{
                "term":{"places.type":{"value":"MAIN_LOCATION","boost":1.0}}},
                {"terms":{"places.placeId":[7146],"boost":1.0}}],
                "adjust_pure_negative":true,
                "boost":1.0}
            
          }],
          "other_bucket":false,
          "other_bucket_key":"_other_"},
          "aggregations":{
            "group_by_place_id":{
              "terms":{
                "field":"places.placeId",
                "size":193,
                "min_doc_count":1,
                "shard_min_doc_count":0,
                "show_term_doc_count_error":false,
                "order":[
                  {"_count":"desc"},
                  {"_key":"asc"}],
                  "include":["7146"]},
                  "aggregations":{
                    "totalRecommendationsForPlace":{
                      "sum":{
                        "field":"numberOfRecommendations"
                        
                      }
                      
                    }
                    
                  }
              
            }
            
          }
        
      }
      
    }
    
  }
  
}
}

查询的当前输出是totalhotels是正确的,但totalrecommendations是错误的,并且总是0,这意味着子聚合没有按预期工作

"aggregations" : {
    "nestedPlaces" : {
      "doc_count" : 7,
      "placeFilter" : {
        "buckets" : [
          {
            "doc_count" : 3,
            "group_by_place_id" : {
              "doc_count_error_upper_bound" : 0,
              "sum_other_doc_count" : 0,
              "buckets" : [
                {
                  "key" : 2032,
                  "doc_count" : 3,
                  "totalRecommendationsForPlace" : {
                    "value" : 0.0
                  }
                }
              ]
            }
          }
        ]
      }
    }
  }

不知道我哪里做错了

wsxa1bj1

wsxa1bj11#

你的查询基本上是正确的,直到你试图得到numberOfRecommendations的总和。由于该字段位于文档的根级别,而不是嵌套文档本身,因此您需要首先添加reverse_nested aggregation以返回到顶级文档,然后只有您可以使用sum聚合,如下所示:

"group_by_place_id": {
          "terms": {
            "field": "places.placeId",
            "size": 193,
            "min_doc_count": 1,
            "shard_min_doc_count": 0,
            "show_term_doc_count_error": false,
            "order": [
              {
                "_count": "desc"
              },
              {
                "_key": "asc"
              }
            ],
            "include": [
              "7146"
            ]
          },
          "aggregations": {
            "back_to_root": {               <----- add this
              "reverse_nested": {},         <----- add this
              "aggs": {
                "totalRecommendationsForPlace": {
                  "sum": {
                    "field": "numberOfRecommendations"
                  }
                }
              }
            }
          }
        }

PS:如果你可以根据类型(主要位置或附属机构)有不同数量的推荐,那么你应该在嵌套级别上有这个数字,你的查询将按原样工作。

相关问题