在elasticsearch中存储和查询嵌套数据的理想结构是什么?

mzmfm0qo  于 2023-06-29  发布在  ElasticSearch
关注(0)|答案(1)|浏览(102)

我使用function_score在elasticsearch上运行一个查询,其中嵌套字段的值用于计算(本例中为price)。以下哪种方法是索引数据的更好方法?

{
      "name": "John",
      "age": 50,
      "country": "US",
      "subscription": {
        "Plan1": {
          "price": 100,
          "date": "June 5th"
        },
        "Plan2": {
          "price": 50,
          "date": "June 6th"
        }
      }
    }

OR

{
  "name": "John",
  "age": 50,
  "country": "US",
  "subscription": [
    {
      "name": "Plan1",
      "price": 100,
      "date": "June 5th"
    },
    {
      "name": "Plan2"
      "price": 50,
      "date": "June 6th"
    }
  ]
}

查询将在“计划名称”和“价格”上进行过滤,并且“价格”将用于分数计算。计划的数量可能超过20个。
编辑1:方法2的示例查询

{
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "filter": [
            {
              "range": {
                "createdatutc": {
                  "gte": "2022-11-01T00:00:00.000Z",
                  "lt": "2023-05-06T00:00:00.000Z",
                  "format": "strict_date_optional_time"
                }
              }
            },
            {
              "terms": {
                "country": [
                  "US"
                ]
              }
            },
            {
              "term": {
                "subscription.name": {
                  "value": "Plan1"
                } 
              }
            }
          ]
        }
      },
      "functions": [
        {
          "filter": {
            "query_string": {
              "default_field": "name",
              "query": "\"john\""
            }
          },
          "script_score": {
            "script": {
              "lang": "painless",
              "source": "for (item in params._source.subscription) {if (item.name == 'Plan1') {return item.price}}"
            }
          }
        }
      ],
      "score_mode": "sum",
      "boost_mode": "replace"
    }
  }
}
d6kp6zgx

d6kp6zgx1#

这取决于你有多少计划。如果只有两个或几个,那么第一个选项更好,否则需要将subscription作为嵌套对象,而嵌套对象在查询性能方面不是最佳的。
对于第一个选项,可以使用subscription.Plan1.price: 100上的单个条件来过滤计划名称和价格,而对于第二个选项,您需要两个条件(因此subscription需要是nested),一个条件是subscription.name: Plan1,另一个条件是subscription.price: 100

UPDATE 1:使用选项1

{
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "filter": [
            {
              "range": {
                "createdatutc": {
                  "gte": "2022-11-01T00:00:00.000Z",
                  "lt": "2023-05-06T00:00:00.000Z",
                  "format": "strict_date_optional_time"
                }
              }
            },
            {
              "terms": {
                "country": [
                  "US"
                ]
              }
            },
            {
              "exists": {
                "field": "subscription.Plan1.price"
              }
            }
          ]
        }
      },
      "functions": [
        {
          "filter": {
            "query_string": {
              "default_field": "name",
              "query": "\"john\""
            }
          },
          "field_value_factor": {
            "field": "subscription.Plan1.price",
            "factor": 1.2
          }
        }
      ],
      "score_mode": "sum",
      "boost_mode": "replace"
    }
  }
}

相关问题