我们已经创建了文档索引
POST sample-index-test/_doc/1
{
"first_name": "James",
"last_name" : "Osaka"
}
当我们在索引上使用匹配查询执行_explain API时,索引中只有一个文档
GET sample-index-test/_explain/1
{
"query": {
"match": {
"first_name": "James"
}
}
}
解释下面的API返回详细信息
- 得分:0.2876821
- 包含术语的文档数:1
- 具有字段的文档总数:1
{
"_index" : "sample-index-test",
"_type" : "_doc",
"_id" : "1",
"matched" : true,
"explanation" : {
"value" : 0.2876821,
"description" : "weight(first_name:james in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.2876821,
"description" : "score(freq=1.0), computed as boost * idf * tf from:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.2876821,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 1,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 1,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.45454544,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
}
}
现在,在几秒钟内多次运行同一索引请求
POST sample-index-test/_doc/1
{
"first_name": "James",
"last_name" : "Cena"
}
再次运行same _explain API返回一个不同的分数,即包含term的文档数和包含field的文档总数。
- 得分:0.046520013
- 包含术语的文档数:10
- 具有字段的文档总数:10
{
"_index" : "sample-index-test",
"_type" : "_doc",
"_id" : "1",
"matched" : true,
"explanation" : {
"value" : 0.046520013,
"description" : "weight(first_name:james in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.046520013,
"description" : "score(freq=1.0), computed as boost * idf * tf from:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.046520017,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 10,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 10,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.45454544,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
}
}
为什么elasticsearch增加字段为的文档总数和包含术语的文档数的计数,而同时索引只包含单个文档?
1条答案
按热度按时间sxpgvts31#
Elasticsearch使用Lucene,所有文档都存储在段中。段是不可变的,文档更新是一个2步的过程。当一个文档被更新时,一个新文档被创建,旧文档被标记为删除。所以,当你在段中创建第一个文档时,只有一个文档。然后你更新同一个文档10次,删除的文档数将为9,最新文档数将为1。因此,“带字段的文档数”和“包含术语的文档数”将发生变化。
您可以使用
_forcemerge
终结点测试。强制合并将合并段并从段中清除已删除的文档。https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-forcemerge.html