elasticsearch中的匹配项?

cunj1qz1  于 2021-06-13  发布在  ElasticSearch
关注(0)|答案(2)|浏览(285)

我在elasticsearch中有以下数据

{
    "_index": "media",
    "_type": "information",
    "_id": "6838",
    "_source": {
        "demographics_countries": {
            "AE": 0.17543859649122806,
            "CA": 0.013157894736842105,
            "FR": 0.017543859649122806,
            "GB": 0.043859649122807015,
            "IT": 0.02631578947368421,
            "LB": 0.013157894736842105,
            "SA": 0.49122807017543857,
            "TR": 0.017543859649122806,
            "US": 0.09210526315789472
        }
    }
},
{
    "_index": "media",
    "_type": "information",
    "_id": "57696",
    "_source": {
        "demographics_countries": {
            "TN": 0.8125,
            "MA": 0.034375,
            "DZ": 0.032812,
            "FR": 0.0125,
            "EG": 0.0125,
            "IN": 0.009375,
            "SA": 0.009375
    }
}
]

预期结果:
找出一份有特定国家的文件 SA (沙特阿拉伯)位列世界前三 demographics_countries 例如:
“_id”:“6838”(第一个文档)匹配是因为 SA (沙特阿拉伯)位列世界前三 demographics_countries 在上述示例文档中。
试过了吗我尝试过使用热门点击进行过滤,但效果不尽如人意。
任何建议都将不胜感激

vdgimpew

vdgimpew1#

在当前的数据模型中,很难做到这一点。我的建议可能不是最简单的方法,但最终肯定是最快的查询方法。
我建议重新设计您的文档,使其包含顶级国家:

[
  {
    "_index": "media",
    "_type": "information",
    "_id": "6838",
    "_source": {
      "top_demographics_countries": ["TN", "MA", "DZ"],
      "demographics_countries": {
        "AE": 0.17543859649122806,
        "CA": 0.013157894736842105,
        "FR": 0.017543859649122806,
        "GB": 0.043859649122807015,
        "IT": 0.02631578947368421,
        "LB": 0.013157894736842105,
        "SA": 0.49122807017543857,
        "TR": 0.017543859649122806,
        "US": 0.09210526315789472
      }
    }
  },
  {
    "_index": "media",
    "_type": "information",
    "_id": "57696",
    "_source": {
      "top_demographics_countries": ["TN", "MA", "DZ"],
      "demographics_countries": {
        "TN": 0.8125,
        "MA": 0.034375,
        "DZ": 0.032812,
        "FR": 0.0125,
        "EG": 0.0125,
        "IN": 0.009375,
        "SA": 0.009375
      }
    }
  }
]

忽略我选择的值 top_demographics_countries . 使用这种方法,您始终可以预先计算top,然后可以使用简单的术语查询来检查文档是否包含该值:

{
  "query": {
    "bool": {
      "filter": {
        "term": {
          "top_demographics_countries": "SA"
        }
      }
    }
  }
}

与总是动态地构建子句相比,在保存期间计算一次它们会更便宜。

sc4hvdpw

sc4hvdpw2#

@evaldas是对的——最好提前提取前三名。
但是,如果您无法控制自己并觉得必须使用java/painless,那么有一种方法:

{
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "demographics_countries.SA"
          }
        },
        {
          "script": {
            "script": {
              "source": """
                def tuple_list = new ArrayList();                

                for (def c : params.all_countries) {
                  def key = 'demographics_countries.'+c;
                  if (!doc.containsKey(key) || doc[key].size() == 0) {
                    continue;
                  }
                  def val = doc[key].value;
                  tuple_list.add([c, val]);
                }

                // sort tuple list by the country values
                Collections.sort(tuple_list, (arr1, arr2) -> arr1[1] < arr2[1] ? 1 : -1);

                // slice & take only the top 3        
                def top_3_countries = tuple_list.subList(0, 3).stream().map(arr -> arr[0]).collect(Collectors.toList());

                return top_3_countries.size() >=3 && top_3_countries.contains(params.country_of_interest);
              """,
              "params": {
                "country_of_interest": "SA",
                "all_countries": [
                  "AE",
                  "CA",
                  "FR",
                  "GB",
                  "IT",
                  "LB",
                  "SA",
                  "TR",
                  "US",
                  "TN",
                  "MA",
                  "DZ",
                  "EG",
                  "IN"
                ]
              }
            }
          }
        }
      ]
    }
  }
}

相关问题