我正在阅读https://www.elastic.co/guide/en/elasticsearch/reference/current/search-as-you-type.html的官方文档,我不明白search_as_you_type字段是如何工作的。
如果有以下设置:
{
"settings": {
"analysis": {
"tokenizer": {
"ngrams": {
"type": "ngram",
"min_gram": 3,
"max_gram": 10
}
},
"analyzer": {
"partial_words" : {
"type": "custom",
"tokenizer": "ngrams",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"properties": {
"my_text": {
"type": "text",
"fields": {
"shingles": {
"type": "search_as_you_type",
"analyzer": "partial_words",
"term_vector": "with_positions_offsets"
},
"ngrams": {
"type": "text",
"analyzer": "partial_words",
"search_analyzer": "standard",
"term_vector": "with_positions_offsets"
}
}
}
}
}
}
我想知道my.text.shingles是如何标记的。例如,文本
"Martin Luther was a german priest"
在索引时间在“my_text”字段中使用分析器“partial_words”进行分析。我应该持有哪些代币
1) my_text.shingles
2) my_text.shingles._2gram
3) my_text.shingles._3gram
谢谢你的光!
编辑:有没有什么方法可以确保(或任何查询)知道_ngram字段给出了这些令牌?
1) my_text.shingles
[Martin, Luther, was, a, german, priest]
2) my_text.shingles._2gram
[Martin Luther, Luther was, was a, a german, german priest]
3) my_text.shingles._3gram
[Martin Luther was, Luther was a, was a german, a german priest]
1条答案
按热度按时间yhqotfr81#
您可以查看this文章了解更多。简单地说,它是标记的话,如下图。
您可以使用_analyze API来查看文本是如何标记的。