在脚本排序ElasticSearch中使用NestedPath不允许访问外部属性

jk9hmnmh  于 2022-11-02  发布在  ElasticSearch
关注(0)|答案(1)|浏览(181)

我需要根据脚本中的两个逻辑部分进行排序。对于每个文档,将计算最小值(总部和办公室与给定距离的距离)并返回以进行排序。由于我只需要返回1个值,因此我需要将计算总部和给定位置之间的距离以及多个办公室和给定位置之间的距离的脚本组合在一起。
我尝试合并这些属性,但Office是嵌套属性,而Headquarter是非嵌套属性。如果我使用“NestedPath”,不知何故,我无法访问Headquarter属性。如果没有“NestedPath”,我无法使用Office属性。以下是Map:

"offices" : {
            "type" : "nested",
            "properties" : {
              "coordinates" : {
                "type" : "geo_point",
                "fields" : {
                  "raw" : {
                    "type" : "text",
                    "index" : false
                  }
                },
                "ignore_malformed" : true
              },
              "state" : {
                "type" : "text"
              }
            }
          },
        "headquarters" : {
            "properties" : {
              "coordinates" : {
                "type" : "geo_point",
                "fields" : {
                  "raw" : {
                    "type" : "text",
                    "index" : false
                  }
                },
                "ignore_malformed" : true
              },
              "state" : {
                "type" : "text"
              }
            }
          }

这是我试过的剧本:

"sort": [
    {
      "_script": {
        "nested" : {
          "path" : "offices"
        },
        "order": "asc",
        "script": {
          "lang": "painless",
          "params": {
            "lat": 28.9672,
            "lon": -98.4786
          },
          "source": "def hqDistance = 1000000;if (!doc['headquarters.coordinates'].empty){hqDistance = doc['headquarters.coordinates'].arcDistance(params.lat, params.lon) * 0.000621371;} def officeDistance= doc['offices.coordinates'].arcDistance(params.lat, params.lon) * 0.000621371; if (hqDistance < officeDistance) { return hqDistance; } return officeDistance;"
        },
        "type": "Number"
      }
    }
  ],

当我运行脚本时,总部的逻辑甚至没有执行,似乎,我得到的结果只基于办公室距离。

pdkcd3nj

pdkcd3nj1#

Nested字段在单独的上下文中操作,并且它们的内容不能从外部级别访问,反之亦然。

但是,您可以访问文档的raw _source

但有一个问题:

  • 请参阅,当在offices嵌套路径下迭代时,您可以调用.arcDistance,因为coordinates的类型为ScriptDocValues.GeoPoint
  • 但是,一旦访问原始的_source,您将处理一组未优化的java.util.ArrayListjava.util.HashMap

这意味着即使您可以迭代数组列表:

...
for (def office : params._source['offices']) {
   // office.coordinates is a trivial HashMap of {lat, lon}!
}

计算地理距离是不可能的
...除非您编写自己的geoDistance函数--这对于Painless来说完全可以,但是需要在脚本的顶部定义它。

示例实现

假设您的文档如下所示:

POST my-index/_doc
{
  "offices": [
    {
      "coordinates": "39.9,-74.92",
      "state": "New Jersey"
    }
  ],
  "headquarters": {
    "coordinates": {
      "lat": 40.7128,
      "lon": -74.006
    },
    "state": "NYC"
  }
}

您的排序脚本可能如下所示:

GET my-index/_search
{
   "sort": [
    {
      "_script": {
        "order": "asc",
        "script": {
          "lang": "painless",
          "params": {
            "lat": 28.9672,
            "lon": -98.4786
          },
          "source": """
            // We can declare functions at the beginning of a Painless script
            // https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-functions.html#painless-functions

            double deg2rad(double deg) {
              return (deg * Math.PI / 180.0);
            }

            double rad2deg(double rad) {
              return (rad * 180.0 / Math.PI);
            }

            // https://stackoverflow.com/a/3694410/8160318
            double geoDistanceInMiles(def lat1, def lon1, def lat2, def lon2) {
              double theta = lon1 - lon2;
              double dist = Math.sin(deg2rad(lat1)) * Math.sin(deg2rad(lat2)) + Math.cos(deg2rad(lat1)) * Math.cos(deg2rad(lat2)) * Math.cos(deg2rad(theta));
              dist = Math.acos(dist);
              dist = rad2deg(dist);
              return dist * 60 * 1.1515;
            }

            // start off arbitrarily high            
            def hqDistance = 1000000;

            if (!doc['headquarters.coordinates'].empty) {
              hqDistance = doc['headquarters.coordinates'].arcDistance(params.lat, params.lon) * 0.000621371;
            }

            // assume office distance as large as hq distance
            def officeDistance = hqDistance;

            // iterate each office and compare it to the currently lowest officeDistance
            for (def office : params._source['offices']) {
              // the coordinates are formatted as "lat,lon" so let's split...
              def latLong = Arrays.asList(office.coordinates.splitOnToken(","));
              // ...and parse them before passing onwards
              def tmpOfficeDistance = geoDistanceInMiles(Float.parseFloat(latLong[0]),
                                                         Float.parseFloat(latLong[1]),
                                                         params.lat,
                                                         params.lon);
              // we're interested in the nearest office...
              if (tmpOfficeDistance < officeDistance) {
                officeDistance = tmpOfficeDistance;
              }
            }

            if (hqDistance < officeDistance) {
              return hqDistance;
            }

            return officeDistance;
          """
        },
        "type": "Number"
      }
    }
  ]
}

无耻的塞:我深入研究了dedicated chapter of my ES Handbook中的Elasticsearch脚本。

相关问题