ElasticSearchword_delimiter_graph仅在特定分隔符上拆分标记

oymdgrw7 于 2023-01-04 发布在 ElasticSearch

关注(0)|答案(1)|浏览(97)

我想使用一个Elasticsearch的令牌过滤器，其作用类似于word_delimiter_graph，但仅在特定分隔符上拆分令牌（如果我没有错，默认word_delimiter_graph不允许使用自定义分隔符列表）。
例如，我只想按-分隔符拆分令牌：
i-pod-〉[i-pod, i, pod]
i_pod-〉[i_pod]（因为我只想在-上拆分，而不想拆分任何其他字符。）
我该如何存档？
谢谢大家!

elasticsearch

来源：https://stackoverflow.com/questions/74985865/elasticsearch-word-delimiter-graph-split-token-on-specific-delimiter-only

1条答案

按热度按时间

vaj7vani1#

我使用了一些参数type_table。

(Optional, array of strings) Array of custom type mappings for characters. This allows you to map non-alphanumeric characters as numeric or alphanumeric to avoid splitting on those characters.

For example, the following array maps the plus (+) and hyphen (-) characters as alphanumeric, which means they won’t be treated as delimiters.

试验：
输入键盘

GET /_analyze
{
  "tokenizer": "keyword",
  "filter": {
    "type": "word_delimiter_graph",
    "preserve_original": true,
    "type_table": [ "_ => ALPHA" ]
  },
  "text": "i-pad"
}

令牌：

{
  "tokens": [
    {
      "token": "i-pad",
      "start_offset": 0,
      "end_offset": 5,
      "type": "word",
      "position": 0,
      "positionLength": 2
    },
    {
      "token": "i",
      "start_offset": 0,
      "end_offset": 1,
      "type": "word",
      "position": 0
    },
    {
      "token": "pad",
      "start_offset": 2,
      "end_offset": 5,
      "type": "word",
      "position": 1
    }
  ]
}

i_焊盘

GET /_analyze
{
  "tokenizer": "keyword",
  "filter": {
    "type": "word_delimiter_graph",
    "preserve_original": true,
    "type_table": [ "_ => ALPHA" ]
  },
  "text": "i_pad"
}

令牌：

{
  "tokens": [
    {
      "token": "i_pad",
      "start_offset": 0,
      "end_offset": 5,
      "type": "word",
      "position": 0
    }
  ]
}

赞(0）回复(0）举报 2023-01-04

我来回答

ElasticSearchword_delimiter_graph仅在特定分隔符上拆分标记

1条答案

相关问题

热门标签

最新问答