elasticsearch Edge_ngram说明

k4aesqcs  于 2023-02-11  发布在  ElasticSearch
关注(0)|答案(1)|浏览(135)

请解释在以下情况下会发生什么:
1.索引创建

curl --insecure -H "Authorization: ApiKey $ESAPIKEY" -X PUT "https://localhost:9200/tstind?pretty" -H 'Content-Type: application/json' -d'
{
    "settings": {
        "analysis": {
            "filter": {
              "cna_edge_ngram":
                { "type": "edge_ngram",
                  "min_gram": 3,
                  "max_gram": 10
                }
            },
            "analyzer": {
                "cna": {
                    "type": "custom",
                    "tokenizer": "whitespace",
                    "filter": ["lowercase", "cna_edge_ngram"]
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "cn": {
                "type": "text",
                "analyzer": "cna",
                "fielddata": "true"
            }
        }
    }
}
'

1.试验数据

curl --insecure -H "Authorization: ApiKey $ESAPIKEY" -X POST "https://localhost:9200/tstind/_bulk?pretty" -H 'Content-Type: application/json' -d'
{"index": {"_index": "tstind", "_id": "1"}}
{"cn": "carrot"}
{"index": {"_index": "tstind", "_id": "2"}}
{"cn": "apple banana"}
{"index": {"_index": "tstind", "_id": "3"}}
{"cn": "redapple apple orange"}
{"index": {"_index": "tstind", "_id": "4"}}
{"cn": "orange"}
{"index": {"_index": "tstind", "_id": "5"}}
{"cn": "apple"}
{"index": {"_index": "tstind", "_id": "6"}}
{"cn": "cucumber"}
'

1.请求

curl --insecure -H "Authorization: ApiKey $ESAPIKEY" -X GET "https://localhost:9200/tstind/_search?pretty" -H 'Content-Type: application/json' -d'
{  
  "query": {
          "match": {
            "cn": {"query": "appls"}                        
          }
  }
}
'

1.结果:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.246899,
    "hits" : [
      {
        "_index" : "tstind",
        "_id" : "5",
        "_score" : 1.246899,
        "_source" : {
          "cn" : "apple"
        }
      },
      {
        "_index" : "tstind",
        "_id" : "2",
        "_score" : 1.1766877,
        "_source" : {
          "cn" : "apple banana"
        }
      },
      {
        "_index" : "tstind",
        "_id" : "3",
        "_score" : 1.113962,
        "_source" : {
          "cn" : "redapple apple orange"
        }
      }
    ]
  }
}

我希望结果必须是空的,没有命中。为什么在结果中存在不包含请求短语"应用程序"的文档?
我尝试用analyze来调查index中存在哪些标记:

curl --insecure -H "Authorization: ApiKey $ESAPIKEY" -X GET "https://localhost:9200/tstind/_analyze?pretty" -H 'Content-Type: application/json' -d'
{ 
  "analyzer": "cna",
  "text" : "apple banana"
}
'
{
  "tokens" : [
    {
      "token" : "app",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "appl",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "apple",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "ban",
      "start_offset" : 6,
      "end_offset" : 12,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "bana",
      "start_offset" : 6,
      "end_offset" : 12,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "banan",
      "start_offset" : 6,
      "end_offset" : 12,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "banana",
      "start_offset" : 6,
      "end_offset" : 12,
      "type" : "word",
      "position" : 1
    }
  ]
}

看起来一切正常,但结果不如预期。请解释一下,在这种情况下发生了什么?
我找到了如何用post_filter过滤结果的方法,但我认为这不是最好的主意。

zvms9eto

zvms9eto1#

在本例中,请查看术语“appls”的标记。

{
  "tokens": [
    {
      "token": "app",
      "start_offset": 0,
      "end_offset": 5,
      "type": "word",
      "position": 0
    },
    {
      "token": "appl",
      "start_offset": 0,
      "end_offset": 5,
      "type": "word",
      "position": 0
    },
    {
      "token": "appls",
      "start_offset": 0,
      "end_offset": 5,
      "type": "word",
      "position": 0
    }
  ]
}

现在,用“苹果”这个词来表示:

{
  "tokens": [
    {
      "token": "app",
      "start_offset": 0,
      "end_offset": 5,
      "type": "word",
      "position": 0
    },
    {
      "token": "appl",
      "start_offset": 0,
      "end_offset": 5,
      "type": "word",
      "position": 0
    },
    {
      "token": "apple",
      "start_offset": 0,
      "end_offset": 5,
      "type": "word",
      "position": 0
    }
  ]
}

O匹配,因为词项“appls”和文档“apple”中存在令牌“app”。

相关问题