我是ElasticSearch用法方面的新手,我正在尝试做一个分析器或一个摄取管道,它将创建单词的搭配(单字、双字、三角形,步进到2)。我知道这在python中是可行的,但我只对es解决方案感兴趣。据我所知,我试着用这样的木瓦来做:
GET /_analyze
{
"tokenizer": "standard",
"filter": [
{
"type": "predicate_token_filter",
"script": {
"source": "token.getPosition() % 2 == 0"
}
},
{
"type": "shingle",
"max_shingle_size": 5,
"min_shingle_size": 3,
"output_unigrams":false,
"token_separator":" ",
"filler_token":""
},
"trim",
"unique",
{
"type":"pattern_replace",
"pattern":"\\s+",
"replacement":" "
}
],
"text": "aerial photo airplane taken with a nice camera"
}
它给我这个输出:
{
"tokens" : [
{
"token" : "aerial airplane",
"start_offset" : 0,
"end_offset" : 21,
"type" : "shingle",
"position" : 0
},
{
"token" : "aerial airplane with",
"start_offset" : 0,
"end_offset" : 32,
"type" : "shingle",
"position" : 0,
"positionLength" : 3
},
{
"token" : "airplane",
"start_offset" : 13,
"end_offset" : 28,
"type" : "shingle",
"position" : 1
},
{
"token" : "airplane with",
"start_offset" : 13,
"end_offset" : 32,
"type" : "shingle",
"position" : 1,
"positionLength" : 2
},
{
"token" : "airplane with nice",
"start_offset" : 13,
"end_offset" : 39,
"type" : "shingle",
"position" : 1,
"positionLength" : 3
},
{
"token" : "with",
"start_offset" : 28,
"end_offset" : 35,
"type" : "shingle",
"position" : 2
},
{
"token" : "with nice",
"start_offset" : 28,
"end_offset" : 39,
"type" : "shingle",
"position" : 2,
"positionLength" : 2
},
{
"token" : "nice",
"start_offset" : 35,
"end_offset" : 46,
"type" : "shingle",
"position" : 3
}
]
}
但我的理想输出是(只输出标记)
['aerial', 'photo', 'airplane', 'taken', 'with', 'a', 'nice', 'camera', ('aerial', 'photo'), ('aerial', 'airplane'), ('aerial', 'taken'), ('photo', 'airplane'), ('photo', 'taken'), ('photo', 'with'), ('airplane', 'taken'), ('airplane', 'with'), ('airplane', 'a'), ('taken', 'with'), ('taken', 'a'), ('taken', 'nice'), ('with', 'a'), ('with', 'nice'), ('with', 'camera'), ('a', 'nice'), ('a', 'camera'), ('nice', 'camera'), ('aerial', 'photo', 'airplane'), ('aerial', 'photo', 'taken'), ('aerial', 'photo', 'with'), ('aerial', 'airplane', 'taken'), ('aerial', 'airplane', 'with'), ('aerial', 'taken', 'with'), ('photo', 'airplane', 'taken'), ('photo', 'airplane', 'with'), ('photo', 'airplane', 'a'), ('photo', 'taken', 'with'), ('photo', 'taken', 'a'), ('photo', 'with', 'a'), ('airplane', 'taken', 'with'), ('airplane', 'taken', 'a'), ('airplane', 'taken', 'nice'), ('airplane', 'with', 'a'), ('airplane', 'with', 'nice'), ('airplane', 'a', 'nice'), ('taken', 'with', 'a'), ('taken', 'with', 'nice'), ('taken', 'with', 'camera'), ('taken', 'a', 'nice'), ('taken', 'a', 'camera'), ('taken', 'nice', 'camera'), ('with', 'a', 'nice'), ('with', 'a', 'camera'), ('with', 'nice', 'camera'), ('a', 'nice', 'camera')]
暂无答案!
目前还没有任何答案,快来回答吧!