ElasticSearch:Search_as_you_type字段,它如何标记?

weylhg0b  于 2023-06-21  发布在  ElasticSearch
关注(0)|答案(1)|浏览(99)

我正在阅读https://www.elastic.co/guide/en/elasticsearch/reference/current/search-as-you-type.html的官方文档,我不明白search_as_you_type字段是如何工作的。
如果有以下设置:

{
  "settings": {
    "analysis": {
      "tokenizer": {
         "ngrams": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 10
        }
      },
      "analyzer": {
        "partial_words" : {
          "type": "custom",
          "tokenizer": "ngrams",
          "filter": ["lowercase"]
        }
      }
    }
  },
   "mappings": {
    "properties": {
      "my_text": {
        "type": "text",
        "fields": {
          "shingles": { 
            "type": "search_as_you_type",
            "analyzer": "partial_words",
            "term_vector": "with_positions_offsets"
          },
          "ngrams": {
            "type": "text",
            "analyzer": "partial_words",
            "search_analyzer": "standard",
            "term_vector": "with_positions_offsets"
          }
        }
      }
    }
  }
}

我想知道my.text.shingles是如何标记的。例如,文本

"Martin Luther was a german priest"

在索引时间在“my_text”字段中使用分析器“partial_words”进行分析。我应该持有哪些代币

1) my_text.shingles
2) my_text.shingles._2gram
3) my_text.shingles._3gram

谢谢你的光!
编辑:有没有什么方法可以确保(或任何查询)知道_ngram字段给出了这些令牌?

1) my_text.shingles
[Martin, Luther, was, a, german, priest]

2) my_text.shingles._2gram
[Martin Luther, Luther was, was a, a german, german priest]

3) my_text.shingles._3gram
[Martin Luther was, Luther was a, was a german, a german priest]
yhqotfr8

yhqotfr81#

您可以查看this文章了解更多。简单地说,它是标记的话,如下图。

您可以使用_analyze API来查看文本是如何标记的。

POST test_search_as_you_type2/_analyze
{
  "analyzer": "partial_words",
  "text": ["Martin Luther was a german priest"]
}

相关问题