过滤模式\u替换不替换文本中的模式

p4rjhz4m 于 2021-06-13 发布在 ElasticSearch

关注(0)|答案(0)|浏览(247)

如本文所述，我生成了以下查询分析器：

GET /_analyze
{
  "char_filter" : ["html_strip"],
  "tokenizer" : "whitespace",
  "filter" : [
              "lowercase",
              {
               "type" : "pattern_replace",
               "pattern" : "(([0-9]{1,2}|[0-9]º|primeiro)( de )?(janeiro|fevereiro|março|abril|maio|junho|julho|agosto|setembro|outubro|novembro|dezembro))( de [0-9]{2,4})?",
               "replacement": "date"
              },
              "asciifolding",
              {"type": "stop", "stopwords": "_portuguese_"}],
  "text" : "<h1>olá tudo bem?</h1> 1º de janeiro, 25 de fevereiro, 15 de dezembro, primeiro de abril de 2020, 2 de junho de 2018"
}

但结果查询不会带来替换：

{'tokens': [{'token': 'ola',
   'start_offset': 4,
   'end_offset': 7,
   'type': 'word',
   'position': 0},
  {'token': 'tudo',
   'start_offset': 8,
   'end_offset': 12,
   'type': 'word',
   'position': 1},
  {'token': 'bem?',
   'start_offset': 13,
   'end_offset': 17,
   'type': 'word',
   'position': 2},
  {'token': '1º',
   'start_offset': 23,
   'end_offset': 25,
   'type': 'word',
   'position': 3},
  {'token': 'janeiro,',
   'start_offset': 29,
   'end_offset': 37,
   'type': 'word',
   'position': 5},
  {'token': '25',
   'start_offset': 38,
   'end_offset': 40,
   'type': 'word',
   'position': 6},
  {'token': 'fevereiro,',
   'start_offset': 44,
   'end_offset': 54,
   'type': 'word',
   'position': 8},
  {'token': '15',
   'start_offset': 55,
   'end_offset': 57,
   'type': 'word',
   'position': 9},
  {'token': 'dezembro,',
   'start_offset': 61,
   'end_offset': 70,
   'type': 'word',
   'position': 11},
  {'token': 'primeiro',
   'start_offset': 71,
   'end_offset': 79,
   'type': 'word',
   'position': 12},
  {'token': 'abril',
   'start_offset': 83,
   'end_offset': 88,
   'type': 'word',
   'position': 14},
  {'token': '2020,',
   'start_offset': 92,
   'end_offset': 97,
   'type': 'word',
   'position': 16},
  {'token': '2',
   'start_offset': 98,
   'end_offset': 99,
   'type': 'word',
   'position': 17},
  {'token': 'junho',
   'start_offset': 103,
   'end_offset': 108,
   'type': 'word',
   'position': 19},
  {'token': '2018',
   'start_offset': 112,
   'end_offset': 116,
   'type': 'word',
   'position': 21}]}

文档中有一条警告：“pattern\u replace filter使用java的正则表达式语法。”。
因此，我在这里测试了我的regex是否符合该语法，它起了作用，如下面的印刷品所示：

有人知道会出什么问题吗？