Elasticsearch用于检索重复项的多术语聚合

lstz6jyr  于 2023-03-07  发布在  ElasticSearch
关注(0)|答案(2)|浏览(162)

在我的Elasticsearch索引中,我有重复的文档,其中一些**"唯一"字段**具有相同的值。
为了修复它们,我必须找到它们,所以我使用了一个带有min_doc_count=2的聚合查询,问题是我只能用一个键运行它,而不能用两个键运行它,所以它是这样工作的:

GET /my_index/_search
{
   "size": 0,
   "aggs": {
      "receipts": {
         "terms": {
            "field": "key1",
            "min_doc_count": 2,
            "size": 1000000
          }
      }
  }
}

我希望有两个同时匹配的项,但是如何插入double**fieldkey2
你知道吗?
我尝试使用multi-terms aggregations,如下所示(我不知道语法是否正确):

GET /my_index/_search
{
   "size": 0,
   "aggs": {
      "receipts": {
          "multi_terms": {
            "terms": [
              {
                "field": "key1" 
              }, 
              {
                "field": "key2"
              }
            ],
            "min_doc_count": 2,
            "size": 1000000
       }
   }
  }
}

但我得到了这个错误:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "parsing_exception",
        "reason" : "Unknown aggregation type [multi_terms] did you mean [rare_terms]?",
        "line" : 5,
        "col" : 26
      }
    ],
    "type" : "parsing_exception",
    "reason" : "Unknown aggregation type [multi_terms] did you mean [rare_terms]?",
    "line" : 5,
    "col" : 26,
    "caused_by" : {
      "type" : "named_object_not_found_exception",
      "reason" : "[5:26] unknown field [multi_terms]"
    }
  },
  "status" : 400
}
7bsow1i6

7bsow1i61#

您也可以使用脚本执行此操作:

GET /docs/_search
{
  "size": 0,
  "aggs": {
    "receipts": {
      "terms": {
        "script": "doc['key1'].value + '_' + doc['key2'].value",
        "min_doc_count": 2,
        "size": 1000000
      }
    }
  }
}

但是您需要知道,当我们与术语查询进行比较时,这里可能存在性能问题。
这里还有一些示例文档:

POST docs/_doc
{
  "key1": 1,
  "key2": 2
}
POST docs/_doc
{
  "key1": 1,
  "key2": 2
}
POST docs/_doc
{
  "key1": 2,
  "key2": 1
}

以及上述查询的结果:

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "receipts": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "1_2",
          "doc_count": 2
        }
      ]
    }
  }
}
vkc1a9a2

vkc1a9a22#

Elasticsearch子聚合可以解决您的问题。

GET /my_index/_search
{
  "size": 0,
  "aggs": {
    "receipts": {
      "terms": {
        "field": "key1",
        "min_doc_count": 2,
        "size": 1000000
      },
      "aggs": {
        "NAME": {
          "terms": {
            "field": "key2",
            "min_doc_count": 2,
            "size": 1000000
          }
        }
      }
    }
  }
}

相关问题