我已经熟悉shingle
分析器,并且能够创建一个带状疱疹分析器,如下所示:
"index": {
"number_of_shards": 10,
"number_of_replicas": 1
},
"analysis": {
"analyzer": {
"shingle_analyzer": {
"filter": [
"standard",
"lowercase"
"filter_shingle"
]
}
},
"filter": {
"filter_shingle": {
"type": "shingle",
"max_shingle_size": 2,
"min_shingle_size": 2,
"output_unigrams": false
}
}
}
}
然后我使用mapping
中定义的分析器来分析文档中名为content
的字段。问题是content
字段是一个很长的文本,我想将其用作自动完成建议器的数据,所以我只需要匹配短语后面的一两个单词。我想知道是否有办法获得search
(或suggest
或analyze
)API也会产生瓦片区。通过使用shingle analyzer
,elastic
本身会将文本索引为瓦片区,是否有办法访问这些瓦片区?
例如,我传递的查询是:
GET the_index/_search
{
"_source": ["content"],
"explain": true,
"query" : {
"match" : { "content.shngled_field": "news" }
}
}
结果是:
{
"took" : 395,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : 7.8647532,
"hits" : [
{
"_shard" : "[v3_kavan_telegram_201911][0]",
"_node" : "L6vHYla-TN6CHo2I6g4M_A",
"_index" : "v3_kavan_telegram_201911",
"_type" : "_doc",
"_id" : "g1music/70733",
"_score" : 7.8647532,
"_source" : {
"content" : "Find the latest breaking news and information on the top stories, weather, business, entertainment, politics, and more."
....
}
如您所见,结果包含整个content
文件,这是一个非常长的文本。
"content" : "news and information on"
即匹配的木瓦本身。
1条答案
按热度按时间lqfhib0f1#
创建索引并摄取文档后
您可以调用
_analyze
w/相应的分析器来查看给定文本是如何被标记的:或者查看term vectors信息:
你也要强调吗?