在elasticsearch中,匹配短语查询中的单个单词是否有字符限制?

c2e8gylq  于 2021-06-10  发布在  ElasticSearch
关注(0)|答案(2)|浏览(364)

对于elastic search来说还很陌生,所以我可能不得不直言不讳,我遇到了一个问题,如果我搜索一个使用20个或更少字符的文档,该文档就会出现,但是在查询中,如果同一个词中有更多字符,我将得不到任何结果:
使用“苯氧甲基青霉素”不会带来任何文件。
使用“苯氧甲基青霉素”会带回文件。
这是我尝试使用的查询:

{
    "match_phrase": {
        "genericNames.name": {
        "query": "phenoxymethylpenicillin",
        "slop": 15,
        "zero_terms_query": "NONE",
        "boost": 1.0
        }
    }
}

以下是完整查询:https://pastebin.com/dejvp2us
就像我说的,我对这方面还比较陌生,可能是找不到正确的地方。
所以我的问题是,什么可能的领域会导致这种情况,为什么?
谢谢!
编辑:提供的是从示例数据的一个文档中提取的内容。我不能显示很多,因为它是敏感的,幸运的是,我可以分享的样本数据的名称很多。这是我试图搜索的数据:

"genericNames":[
{
    "nameType":1,
    "name":"Phenoxymethylpenicillin 250mg tablets",
    "nameChangeCode":"0000",
    "nameBasisCode":"0001",
    "nameTypeDescription":"Name",
    "startDate":"1948-01-01T00:00:00.000000+0000",
    "endDate":"3456-02-01T00:00:00.000000+0000"
},
{
    "nameType":5,
    "name":"Penicillin V 250mg tablets",
    "nameTypeDescription":"Alternative Name 3",
    "startDate":"1948-01-01T00:00:00.000000+0000",
    "endDate":"3456-02-01T00:00:00.000000+0000"
}
],

我还提供了索引Map,因为它可能提供额外的信息:

{
    "amp": {
        "mappings": {
            "properties": {
                "_class": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "ampId": {
                    "type": "long"
                },
                "amppId": {
                    "type": "long"
                },
                "attributes": {
                    "type": "nested",
                    "properties": {
                        "attributeQualifier": {
                            "type": "keyword"
                        },
                        "attributeType": {
                            "type": "integer"
                        },
                        "attributeTypeDescription": {
                            "type": "keyword"
                        },
                        "attributeValue": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "countryId": {
                            "type": "long"
                        },
                        "decodedValue": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "dictionaries": {
                    "type": "nested",
                    "properties": {
                        "abbreviation": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "description": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "dictId": {
                            "type": "integer"
                        },
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "endDate": {
                    "type": "date",
                    "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                },
                "excipients": {
                    "type": "nested",
                    "properties": {
                        "basisOfStrengthCode": {
                            "type": "keyword"
                        },
                        "bossId": {
                            "type": "long"
                        },
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "id": {
                            "type": "long"
                        },
                        "ingredientNames": {
                            "properties": {
                                "endDate": {
                                    "type": "date"
                                },
                                "name": {
                                    "type": "text",
                                    "fields": {
                                        "keyword": {
                                            "type": "keyword",
                                            "ignore_above": 256
                                        }
                                    }
                                },
                                "startDate": {
                                    "type": "date"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "strengthDenominatorUnitOfMeasureCode": {
                            "type": "keyword"
                        },
                        "strengthDenominatorValue": {
                            "type": "keyword"
                        },
                        "strengthNumeratorUnitOfMeasureCode": {
                            "type": "keyword"
                        },
                        "strengthNumeratorValue": {
                            "type": "keyword"
                        },
                        "strengthVal": {
                            "type": "keyword"
                        },
                        "unitOfMeasure": {
                            "type": "keyword"
                        }
                    }
                },
                "extractableEntry": {
                    "type": "boolean"
                },
                "genericNames": {
                    "type": "nested",
                    "properties": {
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "name": {
                            "type": "text",
                            "ignore_above": 256,
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            },
                            "analyzer": "autocomplete_index",
                            "search_analyzer": "autocomplete_search"
                        },
                        "nameBasisCode": {
                            "type": "keyword"
                        },
                        "nameChangeCode": {
                            "type": "keyword"
                        },
                        "nameType": {
                            "type": "integer"
                        },
                        "nameTypeDescription": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "id": {
                    "type": "keyword"
                },
                "ingredients": {
                    "type": "nested",
                    "properties": {
                        "basisOfStrengthCode": {
                            "type": "keyword"
                        },
                        "bossId": {
                            "type": "long"
                        },
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "id": {
                            "type": "long"
                        },
                        "ingredientNames": {
                            "properties": {
                                "endDate": {
                                    "type": "date"
                                },
                                "name": {
                                    "type": "text",
                                    "fields": {
                                        "keyword": {
                                            "type": "keyword",
                                            "ignore_above": 256
                                        }
                                    }
                                },
                                "startDate": {
                                    "type": "date"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "strengthDenominatorUnitOfMeasureCode": {
                            "type": "keyword"
                        },
                        "strengthDenominatorValue": {
                            "type": "keyword"
                        },
                        "strengthNumeratorUnitOfMeasureCode": {
                            "type": "keyword"
                        },
                        "strengthNumeratorValue": {
                            "type": "keyword"
                        },
                        "strengthVal": {
                            "type": "keyword"
                        },
                        "unitOfMeasure": {
                            "type": "keyword"
                        }
                    }
                },
                "invalidEntry": {
                    "type": "boolean"
                },
                "pitId": {
                    "type": "integer"
                },
                "ppaCodes": {
                    "type": "nested",
                    "properties": {
                        "code": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "proprietaryNames": {
                    "type": "nested",
                    "properties": {
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "name": {
                            "type": "text",
                            "ignore_above": 256,
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            },
                            "analyzer": "autocomplete_index",
                            "search_analyzer": "autocomplete_search"
                        },
                        "nameBasisCode": {
                            "type": "keyword"
                        },
                        "nameChangeCode": {
                            "type": "keyword"
                        },
                        "nameType": {
                            "type": "integer"
                        },
                        "nameTypeDescription": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "qpuUomCde": {
                    "type": "keyword"
                },
                "qpuVal": {
                    "type": "keyword"
                },
                "qtyUomCde": {
                    "type": "keyword"
                },
                "qtyVal": {
                    "type": "keyword"
                },
                "snomedCodes": {
                    "type": "nested",
                    "properties": {
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "ppaNextNo": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "snomed": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "snomedDescriptions": {
                    "type": "nested",
                    "properties": {
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "ppaNextNo": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "snomed": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "startDate": {
                    "type": "date",
                    "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                },
                "suppliers": {
                    "type": "nested",
                    "properties": {
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "id": {
                            "type": "long"
                        },
                        "names": {
                            "type": "nested",
                            "properties": {
                                "endDate": {
                                    "type": "date",
                                    "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                                },
                                "name": {
                                    "type": "text",
                                    "fields": {
                                        "raw": {
                                            "type": "keyword"
                                        }
                                    },
                                    "analyzer": "autocomplete_index",
                                    "search_analyzer": "autocomplete_search"
                                },
                                "nameBasisCode": {
                                    "type": "keyword"
                                },
                                "nameChangeCode": {
                                    "type": "keyword"
                                },
                                "nameType": {
                                    "type": "integer"
                                },
                                "nameTypeDescription": {
                                    "type": "text",
                                    "fields": {
                                        "raw": {
                                            "type": "keyword"
                                        }
                                    }
                                },
                                "startDate": {
                                    "type": "date",
                                    "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "udfs": {
                    "type": "nested",
                    "properties": {
                        "ddIndicator": {
                            "type": "integer"
                        },
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "udfsUomCode": {
                            "type": "keyword"
                        },
                        "udfsValue": {
                            "type": "keyword"
                        },
                        "vmpUomCode": {
                            "type": "keyword"
                        }
                    }
                },
                "vmpId": {
                    "type": "long"
                },
                "vmppId": {
                    "type": "long"
                },
                "vtms": {
                    "type": "nested",
                    "properties": {
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "id": {
                            "type": "long"
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                }
            }
        }
    }
}

编辑:添加了完整查询的链接-https://pastebin.com/dejvp2us
编辑:索引设置:

{
    "index": {
        "max_ngram_diff": "20",
        "analysis": {
            "filter": {
                "autocomplete_suffix_filter": {
                    "type": "ngram",
                    "min_gram": "1",
                    "max_gram": "20"
                },
                "autocomplete_filter": {
                    "type": "edge_ngram",
                    "min_gram": "1",
                    "max_gram": "20"
                }
            },
            "analyzer": {
                "autocomplete_index": {
                    "filter": [
                        "lowercase",
                        "autocomplete_filter",
                        "autocomplete_suffix_filter"
                    ],
                    "type": "custom",
                    "tokenizer": "standard"
                },
                "autocomplete_search": {
                    "filter": [
                        "lowercase"
                    ],
                    "type": "custom",
                    "tokenizer": "standard"
                }
            }
        },
        "number_of_replicas": "1"
    }
}
kninwzqo

kninwzqo1#

在上面提供的索引Map中, genericNames 是嵌套类型,因此需要使用嵌套查询
添加一个工作示例,使用与上面提供的相同的索引数据以及搜索查询和搜索结果。
搜索查询:

{
  "query": {
    "nested": {
      "path": "genericNames",
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "genericNames.name": "phenoxymethylpenicillin"
              }
            }
          ]
        }
      },
      "inner_hits":{}
    }
  }
}

搜索结果:

"hits": [
                {
                  "_index": "64817981",
                  "_type": "_doc",
                  "_id": "1",
                  "_nested": {
                    "field": "genericNames",
                    "offset": 0
                  },
                  "_score": 0.7361701,
                  "_source": {
                    "nameType": 1,
                    "name": "Phenoxymethylpenicillin 250mg tablets",
                    "nameChangeCode": "0000",
                    "nameBasisCode": "0001",
                    "nameTypeDescription": "Name",
                    "startDate": "1948-01-01T00:00:00.000000+0000",
                    "endDate": "3456-02-01T00:00:00.000000+0000"
                  }
                }
              ]
sy5wg1nm

sy5wg1nm2#

这一定是由于您的计算机上有自定义分析器 genericNames.name 字段,您有不同的自定义分析器,索引时间您正在使用的 autocomplete_index 和搜索时间 autocomplete_search 分析器,但这些分析器的定义并没有在问题中提供,只是 mapping 提供零件。
请提供输出 _setting 索引上的api,请参阅https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-get-settings.html 更多信息。
您需要检查为生成的令牌 phenoxymethylpenicillin 将分析api用于 autocomplete_index 以及 autocomplete_search 你会注意到其中的区别。

相关问题