ElasticSearchword_delimiter_graph仅在特定分隔符上拆分标记

oymdgrw7  于 2023-01-04  发布在  ElasticSearch
关注(0)|答案(1)|浏览(97)

我想使用一个Elasticsearch的令牌过滤器,其作用类似于word_delimiter_graph,但仅在特定分隔符上拆分令牌(如果我没有错,默认word_delimiter_graph不允许使用自定义分隔符列表)。
例如,我只想按-分隔符拆分令牌:
i-pod-〉[i-pod, i, pod]
i_pod-〉[i_pod](因为我只想在-上拆分,而不想拆分任何其他字符。)
我该如何存档?
谢谢大家!

vaj7vani

vaj7vani1#

我使用了一些参数type_table。

(Optional, array of strings) Array of custom type mappings for characters. This allows you to map non-alphanumeric characters as numeric or alphanumeric to avoid splitting on those characters.

For example, the following array maps the plus (+) and hyphen (-) characters as alphanumeric, which means they won’t be treated as delimiters.

试验:
输入键盘

GET /_analyze
{
  "tokenizer": "keyword",
  "filter": {
    "type": "word_delimiter_graph",
    "preserve_original": true,
    "type_table": [ "_ => ALPHA" ]
  },
  "text": "i-pad"
}

令牌:

{
  "tokens": [
    {
      "token": "i-pad",
      "start_offset": 0,
      "end_offset": 5,
      "type": "word",
      "position": 0,
      "positionLength": 2
    },
    {
      "token": "i",
      "start_offset": 0,
      "end_offset": 1,
      "type": "word",
      "position": 0
    },
    {
      "token": "pad",
      "start_offset": 2,
      "end_offset": 5,
      "type": "word",
      "position": 1
    }
  ]
}

i_焊盘

GET /_analyze
{
  "tokenizer": "keyword",
  "filter": {
    "type": "word_delimiter_graph",
    "preserve_original": true,
    "type_table": [ "_ => ALPHA" ]
  },
  "text": "i_pad"
}

令牌:

{
  "tokens": [
    {
      "token": "i_pad",
      "start_offset": 0,
      "end_offset": 5,
      "type": "word",
      "position": 0
    }
  ]
}

相关问题