Opensearch/Elasticsearch排序,两个参数的权重/优先级相等

b09cbbtk  于 2023-03-29  发布在  ElasticSearch
关注(0)|答案(1)|浏览(296)

我试图了解如何在Opensearch中解决这个问题(但Elasticsearch解决方案可以)。
本质上,我有一个工作索引,我试图根据两个参数对它们进行排序,每个参数的权重相同:订阅层和受欢迎度分数(每个是每个工作文档中的字段)。
通常情况下,当你排序时,你首先根据一个排序,然后是另一个,本质上我需要混合它们,并给予每个50/50的权重。
当工作按相关性排序(默认)时,我们希望这是其订阅层和工作个体相关性得分的组合,根据权重w,例如以下公式:
工作将根据加权得分进行排名。
加权分数=(r1 x w)+(r2 x(1-w),其中:
r1 =在只考虑相关性的情况下,职位在给定搜索中的排名;以及r2 =在仅考虑订阅的情况下针对给定搜索的作业排名的位置
然而,问题是我需要执行多次搜索来获得每个作业的每个排序标准的排名,这将是非常低效的。我试图看看我是否可以用Opensearch解决这个问题。
例如,我试图将其计算为脚本得分函数,纯粹使用两个字段,但它们完全不相关,并且在其间没有归一化,因此分配相等的权重变得具有挑战性。
以下是我目前为止所做的尝试。首先添加一些测试文档:

POST _bulk
{"index":{"_index":"tier-sort","_id":"1"}}
{"title":"Job 1","popularity_score":"0.105","bid":"100"}
{"index":{"_index":"tier-sort","_id":"2"}}
{"title":"Job 2","popularity_score":"0.06","bid":"50"}
{"index":{"_index":"tier-sort","_id":"3"}}
{"title":"Job 3","popularity_score":"0.099","bid":"25"}
{"index":{"_index":"tier-sort","_id":"4"}}
{"title":"Job 4","popularity_score":"0.155","bid":"5"}
{"index":{"_index":"tier-sort","_id":"5"}}
{"title":"Job 5","popularity_score":"0.028","bid":"100"}
{"index":{"_index":"tier-sort","_id":"6"}}
{"title":"Job 6","popularity_score":"0.118","bid":"100"}
{"index":{"_index":"tier-sort","_id":"7"}}
{"title":"Job 7","popularity_score":"0.186","bid":"50"}
{"index":{"_index":"tier-sort","_id":"8"}}
{"title":"Job 8","popularity_score":"0.019","bid":"25"}
{"index":{"_index":"tier-sort","_id":"9"}}
{"title":"Job 9","popularity_score":"0.081","bid":"5"}
{"index":{"_index":"tier-sort","_id":"10"}}
{"title":"Job 10","popularity_score":"0.124","bid":"100"}
{"index":{"_index":"tier-sort","_id":"11"}}
{"title":"Job 11","popularity_score":"0.163","bid":"100"}
{"index":{"_index":"tier-sort","_id":"12"}}
{"title":"Job 12","popularity_score":"0.025","bid":"50"}
{"index":{"_index":"tier-sort","_id":"13"}}
{"title":"Job 13","popularity_score":"0.16","bid":"25"}
{"index":{"_index":"tier-sort","_id":"14"}}
{"title":"Job 14","popularity_score":"0.119","bid":"5"}
{"index":{"_index":"tier-sort","_id":"15"}}
{"title":"Job 15","popularity_score":"0.16","bid":"100"}

然后,我尝试使用脚本得分,以便每个因素对排序贡献一半:

GET tier-sort/_search
{
  "size": 100,
  "query": {
    
    "function_score": {
      "query": {
        "match_all": {}
      },
      "functions": [
        {
          "script_score": {
            "script": "doc['popularity_score'].value"
          },
        },
        {
          "script_score": {
            "script": "doc['bid'].value"
          },
        }
      ]
    }
  }
}

然而,问题是标准化。出价和人气是完全不同的尺度。如何在Elasticsearch中实现这一点?有没有一种方法可以在本地实现这一点?
先谢了!

cmssoen2

cmssoen21#

有2种方法可以更改Elasticsearch/Opensearch搜索结果的排名
1.增加boosting逻辑(如script_scorefunction scorerank features),更改最终_score

  1. Sort在某个字段上,或者指定排序逻辑,默认情况下ES会在_score上排序,但如果指定的不是_score的排序逻辑,boosting逻辑会被忽略_score会被置为空,只有排序部分生效
    如果你有两个因素在不同的尺度,那么rank_features可以帮助你有效地归一化,例如。
    添加一些文档
POST _bulk
{"index":{"_index":"tier-sort","_id":"1"}}
{"title":"Job 1","rank":{"popularity_score":0.105,"bid":100}}
{"index":{"_index":"tier-sort","_id":"2"}}
{"title":"Job 2","rank":{"popularity_score":0.06,"bid":50}}
{"index":{"_index":"tier-sort","_id":"3"}}
{"title":"Job 3","rank":{"popularity_score":0.099,"bid":25}}

在查询中应用rank_feature

GET tier-sort/_search
{
  "size": 100,
  "query": {
    "bool": {
       "should": [
         {
           "rank_feature": {
             "field": "rank.popularity_score",
             "saturation": {},
             "boost": 0.5
           }
         },
         {
           "rank_feature": {
             "field": "rank.bid",
             "saturation": {},
             "boost": 0.5
           }
         }
       ]
    }
  }
}

您可以在排名功能中选择不同的内置函数,调整pivot来控制结果,也可以使用explain api来详细了解分数的计算方式,这可以帮助您检查查询是否按预期运行

相关问题