我试图了解如何在Opensearch中解决这个问题(但Elasticsearch解决方案可以)。
本质上,我有一个工作索引,我试图根据两个参数对它们进行排序,每个参数的权重相同:订阅层和受欢迎度分数(每个是每个工作文档中的字段)。
通常情况下,当你排序时,你首先根据一个排序,然后是另一个,本质上我需要混合它们,并给予每个50/50的权重。
当工作按相关性排序(默认)时,我们希望这是其订阅层和工作个体相关性得分的组合,根据权重w,例如以下公式:
工作将根据加权得分进行排名。
加权分数=(r1 x w)+(r2 x(1-w),其中:
r1 =在只考虑相关性的情况下,职位在给定搜索中的排名;以及r2 =在仅考虑订阅的情况下针对给定搜索的作业排名的位置
然而,问题是我需要执行多次搜索来获得每个作业的每个排序标准的排名,这将是非常低效的。我试图看看我是否可以用Opensearch解决这个问题。
例如,我试图将其计算为脚本得分函数,纯粹使用两个字段,但它们完全不相关,并且在其间没有归一化,因此分配相等的权重变得具有挑战性。
以下是我目前为止所做的尝试。首先添加一些测试文档:
POST _bulk
{"index":{"_index":"tier-sort","_id":"1"}}
{"title":"Job 1","popularity_score":"0.105","bid":"100"}
{"index":{"_index":"tier-sort","_id":"2"}}
{"title":"Job 2","popularity_score":"0.06","bid":"50"}
{"index":{"_index":"tier-sort","_id":"3"}}
{"title":"Job 3","popularity_score":"0.099","bid":"25"}
{"index":{"_index":"tier-sort","_id":"4"}}
{"title":"Job 4","popularity_score":"0.155","bid":"5"}
{"index":{"_index":"tier-sort","_id":"5"}}
{"title":"Job 5","popularity_score":"0.028","bid":"100"}
{"index":{"_index":"tier-sort","_id":"6"}}
{"title":"Job 6","popularity_score":"0.118","bid":"100"}
{"index":{"_index":"tier-sort","_id":"7"}}
{"title":"Job 7","popularity_score":"0.186","bid":"50"}
{"index":{"_index":"tier-sort","_id":"8"}}
{"title":"Job 8","popularity_score":"0.019","bid":"25"}
{"index":{"_index":"tier-sort","_id":"9"}}
{"title":"Job 9","popularity_score":"0.081","bid":"5"}
{"index":{"_index":"tier-sort","_id":"10"}}
{"title":"Job 10","popularity_score":"0.124","bid":"100"}
{"index":{"_index":"tier-sort","_id":"11"}}
{"title":"Job 11","popularity_score":"0.163","bid":"100"}
{"index":{"_index":"tier-sort","_id":"12"}}
{"title":"Job 12","popularity_score":"0.025","bid":"50"}
{"index":{"_index":"tier-sort","_id":"13"}}
{"title":"Job 13","popularity_score":"0.16","bid":"25"}
{"index":{"_index":"tier-sort","_id":"14"}}
{"title":"Job 14","popularity_score":"0.119","bid":"5"}
{"index":{"_index":"tier-sort","_id":"15"}}
{"title":"Job 15","popularity_score":"0.16","bid":"100"}
然后,我尝试使用脚本得分,以便每个因素对排序贡献一半:
GET tier-sort/_search
{
"size": 100,
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": "doc['popularity_score'].value"
},
},
{
"script_score": {
"script": "doc['bid'].value"
},
}
]
}
}
}
然而,问题是标准化。出价和人气是完全不同的尺度。如何在Elasticsearch中实现这一点?有没有一种方法可以在本地实现这一点?
先谢了!
1条答案
按热度按时间cmssoen21#
有2种方法可以更改Elasticsearch/Opensearch搜索结果的排名
1.增加boosting逻辑(如script_score、function score、rank features),更改最终
_score
_score
上排序,但如果指定的不是_score
的排序逻辑,boosting逻辑会被忽略,_score
会被置为空,只有排序部分生效如果你有两个因素在不同的尺度,那么rank_features可以帮助你有效地归一化,例如。
添加一些文档
在查询中应用rank_feature
您可以在排名功能中选择不同的内置函数,调整pivot来控制结果,也可以使用explain api来详细了解分数的计算方式,这可以帮助您检查查询是否按预期运行