我有1.13亿个文档的elasticsearch索引(大小约为133 Gb)
"mappings": {
"dynamic": "strict",
"properties": {
...some fields...,
"id": {
"type": "long"
}
}
},
"settings": {
"index": {
...
"number_of_shards": "1",
"provided_name": "users_index",
"number_of_replicas": "1",
...
}
}
当我使用GET users_index/_search时
{
"_source": false,
"track_total_hits": true,
"size": 1,
"sort": {
"id": "desc"
}
}
我得到的回应是
{
"took": 1960,
"timed_out": true,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 113978032,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "users_index",
"_type": "_doc",
"_id": "108384632",
"_score": null,
"sort": [
108384632
]
}
]
}
}
但是108384632
不是index中的最大id,因为如果我搜索精确的id=114098981
,我会得到它的文档。有人能帮我解决这个问题吗?会不会是因为索引的这种大小,elasticsearch在进行优化时,并没有考虑到所有的文档?如何正确地按id对文档进行排序?
当我用过滤器id > {max id from first request}
调用相同的查询时
{
"_source": false,
"track_total_hits": true,
"size": 1,
"sort": {
"id": "desc"
},
"query": {
"range": {
"id": {"gt": 108384632}
}
}
}
我得到了正确的结果
{
"took": 3318,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 5793959,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "users_index",
"_type": "_doc",
"_id": "114186327",
"_score": null,
"sort": [
114186327
]
}
]
}
}
我还尝试使用聚合来获得结果,但结果更奇怪。
{
"_source": false,
"track_total_hits": true,
"size": 0,
"aggs": {
"test": {
"terms": {
"size": 1,
"field": "id",
"order": {"_key": "desc"}
}
}
}
}
{
"took": 2580,
"timed_out": true,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 114065377,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"test": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 4918762,
"buckets": [
{
"key": 44050081,
"doc_count": 1
}
]
}
}
}
1条答案
按热度按时间9nvpjoqh1#
经过一番研究,我发现响应中有“timed_out”字段,这意味着
设置了超时的Elasticsearch查询可能会返回部分或空结果(如果超时已过期)
更多说明here
将“timeout”设置为-1后,我得到了预期的结果。
我还问了我的队友,发现default_search_timeout被设置为2s。