在Elasticsearch中按ID查找两个索引中的重复项

jobtbby3  于 2022-11-22  发布在  ElasticSearch
关注(0)|答案(1)|浏览(188)

我有几个索引,用于将我的数据存储在与周相关的索引中,模板为myindex-2022-weekOfYear
如何在这些索引中按id查找所有重复项?
我尝试使用聚合(基于此处的另一个问题)

GET myindex-*/_search
{
  "stored_fields": [
    "myKey"
  ],
  "size": 100,
  "aggs": {
    "duplicateNames": {
      "terms": {
        "field": "myKey",
        "min_doc_count": 2
      },
      "aggs": {
        "duplicateDocuments": {
          "top_hits": {}
        }
      }
    }
  }
}

但是看起来这个查询没有正常工作,因为按id搜索单个文档(从查询结果中)只返回一个索引,所以我假设min_doc_count没有按预期工作。
编辑:我看到的回应:

"genres" : {
  "doc_count_error_upper_bound" : 530,
  "sum_other_doc_count" : 357290963,
  "buckets" : [ ]
}

所以可能shard_size太小了(由于es资源的限制,我不能增加它)

dz6r00yl

dz6r00yl1#

Tldr;

我找不到为什么这是不工作,但我做了一个概念证明,这表明它是正确的工作。(对于一个相当小的尺寸)

Package 袋

POST _bulk
{"index": {"_index": "74473038-0", "_id": "1"}}
{"data": "some dummy data", "id": 1}
{"index": {"_index": "74473038-1", "_id": "1"}}
{"data": "some dummy data", "id": 1}
{"index": {"_index": "74473038-2", "_id": "1"}}
{"data": "some dummy data", "id": 1}
{"index": {"_index": "74473038-3", "_id": "1"}}
{"data": "some dummy data", "id": 1}
{"index": {"_index": "74473038-0", "_id": "2"}}
{"data": "some dummy data", "id": 2}
{"index": {"_index": "74473038-2", "_id": "2"}}
{"data": "some dummy data", "id": 2}
{"index": {"_index": "74473038-0", "_id": "3"}}
{"data": "some dummy data", "id": 3}
{"index": {"_index": "74473038-1", "_id": "3"}}
{"data": "some dummy data", "id": 3}
{"index": {"_index": "74473038-3", "_id": "3"}}
{"data": "some dummy data", "id": 3}
{"index": {"_index": "74473038-0", "_id": "4"}}
{"data": "some dummy data", "id": 4}
{"index": {"_index": "74473038-2", "_id": "4"}}
{"data": "some dummy data", "id": 4}
{"index": {"_index": "74473038-3", "_id": "4"}}
{"data": "some dummy data", "id": 4}
{"index": {"_index": "74473038-0", "_id": "5"}}
{"data": "some dummy data", "id": 5}

GET 74473038-*/_search
{
  "size": 0, 
  "aggs": {
    "duplicateNames": {
      "terms": {
        "field": "id",
        "min_doc_count": 2
      },
      "aggs": {
        "duplicateDocuments": {
          "top_hits": {}
        }
      }
    }
  }
}

我得到了预期的,文档id1234。省略5

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 4,
    "successful": 4,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 13,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "duplicateNames": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 1,
          "doc_count": 4,
          "duplicateDocuments": {
            "hits": {
              "total": {
                "value": 4,
                "relation": "eq"
              },
              "max_score": 1,
              "hits": [
                {
                  "_index": "74473038-0",
                  "_id": "1",
                  "_score": 1,
                  "_source": {
                    "data": "some dummy data",
                    "id": 1
                  }
                },
                {
                  "_index": "74473038-1",
                  "_id": "1",
                  "_score": 1,
                  "_source": {
                    "data": "some dummy data",
                    "id": 1
                  }
                },
                {
                  "_index": "74473038-3",
                  "_id": "1",
                  "_score": 1,
                  "_source": {
                    "data": "some dummy data",
                    "id": 1
                  }
                }
              ]
            }
          }
        },
        {
          "key": 3,
          "doc_count": 3,
          "duplicateDocuments": {
            "hits": {
              "total": {
                "value": 3,
                "relation": "eq"
              },
              "max_score": 1,
              "hits": [
                {
                  "_index": "74473038-0",
                  "_id": "3",
                  "_score": 1,
                  "_source": {
                    "data": "some dummy data",
                    "id": 3
                  }
                },
                {
                  "_index": "74473038-1",
                  "_id": "3",
                  "_score": 1,
                  "_source": {
                    "data": "some dummy data",
                    "id": 3
                  }
                },
                {
                  "_index": "74473038-3",
                  "_id": "3",
                  "_score": 1,
                  "_source": {
                    "data": "some dummy data",
                    "id": 3
                  }
                }
              ]
            }
          }
        },
        {
          "key": 4,
          "doc_count": 3,
          "duplicateDocuments": {
            "hits": {
              "total": {
                "value": 3,
                "relation": "eq"
              },
              "max_score": 1,
              "hits": [
                {
                  "_index": "74473038-3",
                  "_id": "4",
                  "_score": 1,
                  "_source": {
                    "data": "some dummy data",
                    "id": 4
                  }
                },
                {
                  "_index": "74473038-2",
                  "_id": "4",
                  "_score": 1,
                  "_source": {
                    "data": "some dummy data",
                    "id": 4
                  }
                },
                {
                  "_index": "74473038-0",
                  "_id": "4",
                  "_score": 1,
                  "_source": {
                    "data": "some dummy data",
                    "id": 4
                  }
                }
              ]
            }
          }
        },
        {
          "key": 2,
          "doc_count": 2,
          "duplicateDocuments": {
            "hits": {
              "total": {
                "value": 2,
                "relation": "eq"
              },
              "max_score": 1,
              "hits": [
                {
                  "_index": "74473038-2",
                  "_id": "2",
                  "_score": 1,
                  "_source": {
                    "data": "some dummy data",
                    "id": 2
                  }
                },
                {
                  "_index": "74473038-0",
                  "_id": "2",
                  "_score": 1,
                  "_source": {
                    "data": "some dummy data",
                    "id": 2
                  }
                }
              ]
            }
          }
        }
      ]
    }
  }
}

相关问题