Elasticsearch按'long'字段按desc排序时显示不相关数据

ws51t4hk  于 2023-06-05  发布在  ElasticSearch
关注(0)|答案(1)|浏览(224)

我有1.13亿个文档的elasticsearch索引(大小约为133 Gb)

"mappings": {
    "dynamic": "strict",
    "properties": {
        ...some fields...,
        "id": {
            "type": "long"
        }
    }
},
"settings": {
    "index": {
        ...
        "number_of_shards": "1",
        "provided_name": "users_index",
        "number_of_replicas": "1",
        ...
    }
}

当我使用GET users_index/_search时

{
    "_source": false,
    "track_total_hits": true,
    "size": 1,
    "sort": {
        "id": "desc"
    }
}

我得到的回应是

{
    "took": 1960,
    "timed_out": true,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 113978032,
            "relation": "eq"
        },
        "max_score": null,
        "hits": [
            {
                "_index": "users_index",
                "_type": "_doc",
                "_id": "108384632",
                "_score": null,
                "sort": [
                    108384632
                ]
            }
        ]
    }
}

但是108384632不是index中的最大id,因为如果我搜索精确的id=114098981,我会得到它的文档。有人能帮我解决这个问题吗?会不会是因为索引的这种大小,elasticsearch在进行优化时,并没有考虑到所有的文档?如何正确地按id对文档进行排序?
当我用过滤器id > {max id from first request}调用相同的查询时

{
    "_source": false,
    "track_total_hits": true,
    "size": 1,
    "sort": {
        "id": "desc"
    },
    "query": {
        "range": {
            "id": {"gt": 108384632}
        }
    }
}

我得到了正确的结果

{
    "took": 3318,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 5793959,
            "relation": "eq"
        },
        "max_score": null,
        "hits": [
            {
                "_index": "users_index",
                "_type": "_doc",
                "_id": "114186327",
                "_score": null,
                "sort": [
                    114186327
                ]
            }
        ]
    }
}

我还尝试使用聚合来获得结果,但结果更奇怪。

{
    "_source": false,
    "track_total_hits": true,
    "size": 0,
    "aggs": {
        "test": {
            "terms": {
                "size": 1,
                "field": "id",
                "order": {"_key": "desc"}
            }
        }
    }
}
{
    "took": 2580,
    "timed_out": true,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 114065377,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "test": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 4918762,
            "buckets": [
                {
                    "key": 44050081,
                    "doc_count": 1
                }
            ]
        }
    }
}
9nvpjoqh

9nvpjoqh1#

经过一番研究,我发现响应中有“timed_out”字段,这意味着
设置了超时的Elasticsearch查询可能会返回部分或空结果(如果超时已过期)
更多说明here
将“timeout”设置为-1后,我得到了预期的结果。

{
    "_source": false,
    "track_total_hits": true,
    "size": 1,
    "sort": {
        "id": "desc"
    },
    "timeout": -1
}

我还问了我的队友,发现default_search_timeout被设置为2s。

相关问题