在ElasticSearch中实现搭配

oxf4rvwz  于 2021-06-10  发布在  ElasticSearch
关注(0)|答案(0)|浏览(406)

我是ElasticSearch用法方面的新手,我正在尝试做一个分析器或一个摄取管道,它将创建单词的搭配(单字、双字、三角形,步进到2)。我知道这在python中是可行的,但我只对es解决方案感兴趣。据我所知,我试着用这样的木瓦来做:

GET /_analyze
{
  "tokenizer": "standard",

  "filter": [
    {
      "type": "predicate_token_filter",
      "script": {
        "source": "token.getPosition() % 2 == 0"
      }
    },
    {
      "type": "shingle",
      "max_shingle_size": 5,
      "min_shingle_size": 3,
      "output_unigrams":false,
      "token_separator":" ",
      "filler_token":""
    },
    "trim",
    "unique",
    {
      "type":"pattern_replace",
      "pattern":"\\s+",
      "replacement":" "
    }

  ],
  "text": "aerial photo airplane taken with a nice camera"
}

它给我这个输出:

{
  "tokens" : [
    {
      "token" : "aerial airplane",
      "start_offset" : 0,
      "end_offset" : 21,
      "type" : "shingle",
      "position" : 0
    },
    {
      "token" : "aerial airplane with",
      "start_offset" : 0,
      "end_offset" : 32,
      "type" : "shingle",
      "position" : 0,
      "positionLength" : 3
    },
    {
      "token" : "airplane",
      "start_offset" : 13,
      "end_offset" : 28,
      "type" : "shingle",
      "position" : 1
    },
    {
      "token" : "airplane with",
      "start_offset" : 13,
      "end_offset" : 32,
      "type" : "shingle",
      "position" : 1,
      "positionLength" : 2
    },
    {
      "token" : "airplane with nice",
      "start_offset" : 13,
      "end_offset" : 39,
      "type" : "shingle",
      "position" : 1,
      "positionLength" : 3
    },
    {
      "token" : "with",
      "start_offset" : 28,
      "end_offset" : 35,
      "type" : "shingle",
      "position" : 2
    },
    {
      "token" : "with nice",
      "start_offset" : 28,
      "end_offset" : 39,
      "type" : "shingle",
      "position" : 2,
      "positionLength" : 2
    },
    {
      "token" : "nice",
      "start_offset" : 35,
      "end_offset" : 46,
      "type" : "shingle",
      "position" : 3
    }
  ]
}

但我的理想输出是(只输出标记)

['aerial', 'photo', 'airplane', 'taken', 'with', 'a', 'nice', 'camera', ('aerial', 'photo'), ('aerial', 'airplane'), ('aerial', 'taken'), ('photo', 'airplane'), ('photo', 'taken'), ('photo', 'with'), ('airplane', 'taken'), ('airplane', 'with'), ('airplane', 'a'), ('taken', 'with'), ('taken', 'a'), ('taken', 'nice'), ('with', 'a'), ('with', 'nice'), ('with', 'camera'), ('a', 'nice'), ('a', 'camera'), ('nice', 'camera'), ('aerial', 'photo', 'airplane'), ('aerial', 'photo', 'taken'), ('aerial', 'photo', 'with'), ('aerial', 'airplane', 'taken'), ('aerial', 'airplane', 'with'), ('aerial', 'taken', 'with'), ('photo', 'airplane', 'taken'), ('photo', 'airplane', 'with'), ('photo', 'airplane', 'a'), ('photo', 'taken', 'with'), ('photo', 'taken', 'a'), ('photo', 'with', 'a'), ('airplane', 'taken', 'with'), ('airplane', 'taken', 'a'), ('airplane', 'taken', 'nice'), ('airplane', 'with', 'a'), ('airplane', 'with', 'nice'), ('airplane', 'a', 'nice'), ('taken', 'with', 'a'), ('taken', 'with', 'nice'), ('taken', 'with', 'camera'), ('taken', 'a', 'nice'), ('taken', 'a', 'camera'), ('taken', 'nice', 'camera'), ('with', 'a', 'nice'), ('with', 'a', 'camera'), ('with', 'nice', 'camera'), ('a', 'nice', 'camera')]

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题