Elasticsearch重新索引同一文档,导致分数更改

fdx2calv  于 2023-02-15  发布在  ElasticSearch
关注(0)|答案(1)|浏览(140)

我们已经创建了文档索引

POST sample-index-test/_doc/1
{
    "first_name": "James",
    "last_name" : "Osaka"
}

当我们在索引上使用匹配查询执行_explain API时,索引中只有一个文档

GET sample-index-test/_explain/1
{
  "query": {
    "match": {
      "first_name": "James"
    }
  }
}

解释下面的API返回详细信息

  • 得分:0.2876821
  • 包含术语的文档数:1
  • 具有字段的文档总数:1
{
  "_index" : "sample-index-test",
  "_type" : "_doc",
  "_id" : "1",
  "matched" : true,
  "explanation" : {
    "value" : 0.2876821,
    "description" : "weight(first_name:james in 0) [PerFieldSimilarity], result of:",
    "details" : [
      {
        "value" : 0.2876821,
        "description" : "score(freq=1.0), computed as boost * idf * tf from:",
        "details" : [
          {
            "value" : 2.2,
            "description" : "boost",
            "details" : [ ]
          },
          {
            "value" : 0.2876821,
            "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
            "details" : [
              {
                "value" : 1,
                "description" : "n, number of documents containing term",
                "details" : [ ]
              },
              {
                "value" : 1,
                "description" : "N, total number of documents with field",
                "details" : [ ]
              }
            ]
          },
          {
            "value" : 0.45454544,
            "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
            "details" : [
              {
                "value" : 1.0,
                "description" : "freq, occurrences of term within document",
                "details" : [ ]
              },
              {
                "value" : 1.2,
                "description" : "k1, term saturation parameter",
                "details" : [ ]
              },
              {
                "value" : 0.75,
                "description" : "b, length normalization parameter",
                "details" : [ ]
              },
              {
                "value" : 1.0,
                "description" : "dl, length of field",
                "details" : [ ]
              },
              {
                "value" : 1.0,
                "description" : "avgdl, average length of field",
                "details" : [ ]
              }
            ]
          }
        ]
      }
    ]
  }
}

现在,在几秒钟内多次运行同一索引请求

POST sample-index-test/_doc/1
{
    "first_name": "James",
    "last_name" : "Cena"
}

再次运行same _explain API返回一个不同的分数,即包含term的文档数和包含field的文档总数。

  • 得分:0.046520013
  • 包含术语的文档数:10
  • 具有字段的文档总数:10
{
  "_index" : "sample-index-test",
  "_type" : "_doc",
  "_id" : "1",
  "matched" : true,
  "explanation" : {
    "value" : 0.046520013,
    "description" : "weight(first_name:james in 0) [PerFieldSimilarity], result of:",
    "details" : [
      {
        "value" : 0.046520013,
        "description" : "score(freq=1.0), computed as boost * idf * tf from:",
        "details" : [
          {
            "value" : 2.2,
            "description" : "boost",
            "details" : [ ]
          },
          {
            "value" : 0.046520017,
            "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
            "details" : [
              {
                "value" : 10,
                "description" : "n, number of documents containing term",
                "details" : [ ]
              },
              {
                "value" : 10,
                "description" : "N, total number of documents with field",
                "details" : [ ]
              }
            ]
          },
          {
            "value" : 0.45454544,
            "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
            "details" : [
              {
                "value" : 1.0,
                "description" : "freq, occurrences of term within document",
                "details" : [ ]
              },
              {
                "value" : 1.2,
                "description" : "k1, term saturation parameter",
                "details" : [ ]
              },
              {
                "value" : 0.75,
                "description" : "b, length normalization parameter",
                "details" : [ ]
              },
              {
                "value" : 1.0,
                "description" : "dl, length of field",
                "details" : [ ]
              },
              {
                "value" : 1.0,
                "description" : "avgdl, average length of field",
                "details" : [ ]
              }
            ]
          }
        ]
      }
    ]
  }
}

为什么elasticsearch增加字段为的文档总数和包含术语的文档数的计数,而同时索引只包含单个文档?

sxpgvts3

sxpgvts31#

Elasticsearch使用Lucene,所有文档都存储在段中。段是不可变的,文档更新是一个2步的过程。当一个文档被更新时,一个新文档被创建,旧文档被标记为删除。所以,当你在段中创建第一个文档时,只有一个文档。然后你更新同一个文档10次,删除的文档数将为9,最新文档数将为1。因此,“带字段的文档数”和“包含术语的文档数”将发生变化。
您可以使用_forcemerge终结点测试。强制合并将合并段并从段中清除已删除的文档。
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-forcemerge.html

## 1. Create the document
POST sample-index-test/_doc/1
{
    "first_name": "James",
    "last_name" : "Osaka"
}

## 2. Get the explain score
GET sample-index-test/_explain/1
{
  "query": {
    "match": {
      "first_name": "James"
    }
  }
}
## "value": 0.2876821,
## n, number of documents containing term => 1
## N, total number of documents with field => 1

## 3.1. Execute this 10 times
POST sample-index-test/_doc/1
{
    "first_name": "James",
    "last_name" : "Cena"
}

## 3.2 You can execute this one also
POST sample-index-test/_update/1
{
  "script" : "ctx._source.first_name = 'James'; ctx._source.last_name = 'Cena';"
}

## 3.3 Even you can use _update_by_query
POST sample-index-test/_update_by_query
{
  "query": {
    "match": {
      "first_name": "James"
    }
  },
  "script": {
    "source": "ctx._source.first_name = 'James'; ctx._source.last_name = 'Cena';",
    "lang": "painless"
  }
}

## 4. Get the explain score
GET sample-index-test/_explain/1
{
  "query": {
    "match": {
      "first_name": "James"
    }
  }
}
## "value": 0.046520013,
## n, number of documents containing term => 10
## N, total number of documents with field => 10

## 5. Execute the force merge. 
POST sample-index-test/_forcemerge

## 6. The ForceMerge will start in the background. So, you need to wait a couple of seconds.
GET sample-index-test/_explain/1
{
  "query": {
    "match": {
      "first_name": "James"
    }
  }
}
## "value": 0.2876821,
## n, number of documents containing term => 1
## N, total number of documents with field => 1

相关问题