elasticsearch 具有不同_routing值的相同ID

u0sqgete  于 2023-06-05  发布在  ElasticSearch
关注(0)|答案(1)|浏览(195)

根据elasticsearch文档,可以使用不同的_routing值索引具有相同_id的文档。因此,文档声明_id上的唯一性不能得到保证,因为这些文档可能会在不同的分片上结束(这似乎是一个特性而不是一个bug)
如果两个具有相同_id的文档使用不同的路由值索引,最终在同一个分片上,情况会如何?考虑下面的查询主体:

PUT test-index
{
  "settings": {
    "index": 
    {
      "number_of_shards": 2
      }
  }
}

PUT test-index/_doc/1?routing=user1
{
  "title": "This is document number with routing=user1"
}

PUT test-index/_doc/1?routing=user2
{
  "title": "This is document number with routing=user2"
}

GET test-index/_search

搜索查询呈现以下结果:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 2,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "test-index",
        "_id": "1",
        "_score": 1,
        "_routing": "user2",
        "_source": {
          "title": "This is document number with routing=user2"
        }
      }
    ]
  }
}

为什么搜索响应只显示user 2下的文档,尽管有2个分片?可以肯定的是,这两个文档最终都在同一个分片上,因为根据公式:

shard_num = (hash(_routing) % num_routing_shards) / routing_factor
where routing_factor = num_routing_shards / num_primary_shards

routing_factor是1(即2个路由分片/ 2个主分片),因此分片ID基本上是_routing值mod 2的散列。
使用您的路由值,我们得到以下分片ID(我们可以实验murmur 3 here):

murmur3("user1") % 2 = 3305849917 % 2 = shard 1
murmur3("user2") % 2 = 4180509323 % 2 = shard 1

但是,如果两个具有相同_id且包含不同_routing值的文档最终都在同一个分片上,为什么它只显示一个文档?

kx1ctssn

kx1ctssn1#

Tldr

因为它们在同一个分片上共享相同的ID,所以第二个查询不是insert,而是一个更新。

证据:

如果按顺序播放以下命令:

PUT 76349386
{
  "settings": {
    "index": 
    {
      "number_of_shards": 2
      }
  }
}

然后,

PUT 76349386/_doc/1?routing=user1
{
  "title": "This is document number with routing=user1"
}

将为您带来:

{
  "_index": "76349386",
  "_id": "1",
  "_version": 1,
  "result": "created",  <= Here it says the result of the operation was a creation
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

但是当你演奏第二个命令时

PUT 76349386/_doc/1?routing=user2
{
  "title": "This is document number with routing=user2"
}

答案看起来有点不同:

{
  "_index": "76349386",
  "_id": "1",
  "_version": 2,
  "result": "updated", <= it is an update.
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 1,
  "_primary_term": 1
}

包含_id % 1的文档已更新。

相关问题