elasticsearch 更改索引中的字段类型而不重新索引

vom3gejh  于 2023-02-18  发布在  ElasticSearch
关注(0)|答案(1)|浏览(153)

首先,我有一个索引模板

GET localhost:9200/_index_template/document

这是输出

{
  "index_templates": [
    {
      "name": "document",
      "index_template": {
        "index_patterns": [
          "v*-documents-*"
        ],
        "template": {
          "settings": {
            "index": {
              "number_of_shards": "1"
            }
          },
          "mappings": {
            "properties": {
              "firstOperationAtUtc": {
                "format": "epoch_millis",
                "ignore_malformed": true,
                "type": "date"
              },
              "firstOperationAtUtcDate": {
                "ignore_malformed": true,
                "type": "date"
              }
            }
          },
          "aliases": {
            "documents-": {}
          }
        },
        "composed_of": [],
        "priority": 501,
        "version": 1
      }
    }
  ]
}

例如,我的数据被编入索引

GET localhost:9200/v2-documents-2021-11-20/_search
{
"query": {
    "bool": {
      "should": [
        {
          "exists": {
            "field": "firstOperationAtUtc"
          }
        }
      ]
    }
  }
}

输出为

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "v2-documents-2021-11-20",
                "_type": "_doc",
                "_id": "9b46d6fe78735274342d1bc539b084510000000455",
                "_score": 1.0,
                "_source": {
                    "firstOperationAtUtc": 1556868952000,
                    "firstOperationAtUtcDate": "2019-05-03T13:35:52.000Z"
                }
            }
        ]
    }
}

接下来,我需要更新字段firstOperationAtUtc的Map并删除格式epoch_millis

localhost:9200/_template/document
{
  "index_patterns": [
    "v*-documents-*"
  ],
  "template": {
    "settings": {
      "index": {
        "number_of_shards": "1"
      }
    },
    "mappings": {
      "properties": {
        "firstOperationAtUtc": {
          "ignore_malformed": true,
          "type": "date"
        },
        "firstOperationAtUtcDate": {
          "ignore_malformed": true,
          "type": "date"
        }
      }
    },
    "aliases": {
      "documents-": {}
    }
  },
  "version": 1
}

之后,如果我得到以前的请求,我仍然有索引数据.
但现在我需要更新字段firstOperationAtUtc并从firstOperationAtUtcDate设置数据

localhost:9200/v2-documents-2021-11-20/_update_by_query
{
  "script": {
    "source": "if (ctx._source.firstOperationAtUtcDate != null) { ctx._source.firstOperationAtUtc = ctx._source.firstOperationAtUtcDate }",
    "lang": "painless"
  },
  "query": {
    "match": {
      "_id": "9b46d6fe78735274342d1bc539b084510000000455"
    }
  }
}

在那之后,如果我收到了之前的请求

GET localhost:9200/v2-documents-2021-11-20/_search
{
"query": {
    "bool": {
      "should": [
        {
          "exists": {
            "field": "firstOperationAtUtc"
          }
        }
      ]
    }
  }
}

我没有索引数据

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    }
}

但是如果我使用id查找,我将使用modify data获取此数据,但我的字段被忽略

GET localhost:9200/v2-documents-2021-11-20/_search

{
    "query": {
    "terms": {
      "_id": [ "9b46d6fe78735274342d1bc539b084510000000455" ] 
    }
  }
}

输出为

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "v2-documents-2021-11-20",
                "_type": "_doc",
                "_id": "9b46d6fe78735274342d1bc539b084510000000455",
                "_score": 1.0,
                "_ignored": [
                    "firstOperationAtUtc"
                ],
                "_source": {
                    "firstOperationAtUtc": "2019-05-03T13:35:52.000Z",
                    "firstOperationAtUtcDate": "2019-05-03T13:35:52.000Z"
                }
            }
        ]
    }
}

我如何索引数据而不重新索引?因为我有数以百万计的数据在索引,这可能会产生巨大的停机时间在生产

szqfcxe2

szqfcxe21#

您更改的是索引模板,而不是索引Map。仅当创建了与名称模式匹配的新索引时才使用索引模板。
您要做的是修改索引的实际Map,如下所示:

PUT test/_mapping
{
  "properties": {
    "firstOperationAtUtc": {
      "ignore_malformed": true,
      "type": "date"
    }
  }
}

但是,这是不可能的,您将得到以下错误,这是有道理的,因为您不能修改现有的字段Map。

Mapper for [firstOperationAtUtc] conflicts with existing mapper:
Cannot update parameter [format] from [epoch_millis] to [strict_date_optional_time||epoch_millis]

通过查询更新看起来有效的唯一原因是因为Map中有"ignore_malformed": true,因为如果删除该参数并尝试再次运行通过查询更新,您将看到以下错误:

"type" : "mapper_parsing_exception",
"reason" : "failed to parse field [firstOperationAtUtc] of type [date] in document with id '2'. Preview of field's value: '2019-05-03T13:35:52.000Z'",
"caused_by" : {
  "type" : "illegal_argument_exception",
  "reason" : "failed to parse date field [2019-05-03T13:35:52.000Z] with format [epoch_millis]",
  "caused_by" : {
    "type" : "date_time_parse_exception",
    "reason" : "date_time_parse_exception: Failed to parse with all enclosed parsers"
  }
}

总结一下,你有两个选择:
1.创建一个具有正确Map的新索引,并将旧索引重新索引到其中,但这似乎不适合您。
1.在现有索引Map中创建新字段(例如firstOperationAtUtcTime),并放弃使用firstOperationAtUtc
这些步骤包括:
1.修改索引模板以添加新字段
1.修改实际索引Map以添加新字段
1.通过修改脚本以写入新字段来运行按查询更新
简而言之:

# 1. Modify your index template

# 2. modify your actual index mapping
PUT v2-documents-2021-11-20/_mapping
{
  "properties": {
    "firstOperationAtUtcTime": {
      "ignore_malformed": true,
      "type": "date"
    }
  }
}

# 3. Run update by query again
POST v2-documents-2021-11-20/_update_by_query
{
  "script": {
    "source": "if (ctx._source.firstOperationAtUtcDate != null) { ctx._source.firstOperationAtUtcTime = ctx._source.firstOperationAtUtcDate; ctx._source.remove('firstOperationAtUtc')}",
    "lang": "painless"
  },
  "query": {
    "match": {
      "_id": "9b46d6fe78735274342d1bc539b084510000000455"
    }
  }
}

相关问题