请解释在以下情况下会发生什么:
1.索引创建
curl --insecure -H "Authorization: ApiKey $ESAPIKEY" -X PUT "https://localhost:9200/tstind?pretty" -H 'Content-Type: application/json' -d'
{
"settings": {
"analysis": {
"filter": {
"cna_edge_ngram":
{ "type": "edge_ngram",
"min_gram": 3,
"max_gram": 10
}
},
"analyzer": {
"cna": {
"type": "custom",
"tokenizer": "whitespace",
"filter": ["lowercase", "cna_edge_ngram"]
}
}
}
},
"mappings": {
"properties": {
"cn": {
"type": "text",
"analyzer": "cna",
"fielddata": "true"
}
}
}
}
'
1.试验数据
curl --insecure -H "Authorization: ApiKey $ESAPIKEY" -X POST "https://localhost:9200/tstind/_bulk?pretty" -H 'Content-Type: application/json' -d'
{"index": {"_index": "tstind", "_id": "1"}}
{"cn": "carrot"}
{"index": {"_index": "tstind", "_id": "2"}}
{"cn": "apple banana"}
{"index": {"_index": "tstind", "_id": "3"}}
{"cn": "redapple apple orange"}
{"index": {"_index": "tstind", "_id": "4"}}
{"cn": "orange"}
{"index": {"_index": "tstind", "_id": "5"}}
{"cn": "apple"}
{"index": {"_index": "tstind", "_id": "6"}}
{"cn": "cucumber"}
'
1.请求
curl --insecure -H "Authorization: ApiKey $ESAPIKEY" -X GET "https://localhost:9200/tstind/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"cn": {"query": "appls"}
}
}
}
'
1.结果:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.246899,
"hits" : [
{
"_index" : "tstind",
"_id" : "5",
"_score" : 1.246899,
"_source" : {
"cn" : "apple"
}
},
{
"_index" : "tstind",
"_id" : "2",
"_score" : 1.1766877,
"_source" : {
"cn" : "apple banana"
}
},
{
"_index" : "tstind",
"_id" : "3",
"_score" : 1.113962,
"_source" : {
"cn" : "redapple apple orange"
}
}
]
}
}
我希望结果必须是空的,没有命中。为什么在结果中存在不包含请求短语"应用程序"的文档?
我尝试用analyze来调查index中存在哪些标记:
curl --insecure -H "Authorization: ApiKey $ESAPIKEY" -X GET "https://localhost:9200/tstind/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
"analyzer": "cna",
"text" : "apple banana"
}
'
{
"tokens" : [
{
"token" : "app",
"start_offset" : 0,
"end_offset" : 5,
"type" : "word",
"position" : 0
},
{
"token" : "appl",
"start_offset" : 0,
"end_offset" : 5,
"type" : "word",
"position" : 0
},
{
"token" : "apple",
"start_offset" : 0,
"end_offset" : 5,
"type" : "word",
"position" : 0
},
{
"token" : "ban",
"start_offset" : 6,
"end_offset" : 12,
"type" : "word",
"position" : 1
},
{
"token" : "bana",
"start_offset" : 6,
"end_offset" : 12,
"type" : "word",
"position" : 1
},
{
"token" : "banan",
"start_offset" : 6,
"end_offset" : 12,
"type" : "word",
"position" : 1
},
{
"token" : "banana",
"start_offset" : 6,
"end_offset" : 12,
"type" : "word",
"position" : 1
}
]
}
看起来一切正常,但结果不如预期。请解释一下,在这种情况下发生了什么?
我找到了如何用post_filter过滤结果的方法,但我认为这不是最好的主意。
1条答案
按热度按时间zvms9eto1#
在本例中,请查看术语“appls”的标记。
现在,用“苹果”这个词来表示:
O匹配,因为词项“appls”和文档“apple”中存在令牌“app”。