从Solr查询中获得奇怪的结果

rvpgvaaj  于 2023-10-18  发布在  Solr
关注(0)|答案(1)|浏览(113)

我用的是Datastax 6.8。这是我的SOLR模式:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<schema name="autoSolrSchema" version="1.5">
  <types>
    <fieldType class="org.apache.solr.schema.StrField" name="StrField"/>
    <fieldType class="org.apache.solr.schema.TextField" name="NameField">
      <analyzer type="index">
        <filter class="solr.ASCIIFoldingFilterFactory"/>
        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.NGramFilterFactory" maxGramSize="15" minGramSize="2"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.NGramFilterFactory" maxGramSize="15" minGramSize="2"/>
      </analyzer>
    </fieldType>
  </types>
  <fields>
    <field indexed="true" multiValued="false" name="nama" type="StrField"/>
    <field indexed="true" multiValued="false" name="nama_copy" type="NameField"/>
  </fields>
  <uniqueKey>(nama)</uniqueKey>
  <copyField dest="nama_copy" source="nama"/>
</schema>

I have this field value in a row我有这个字段值在一行
然后我运行了这个查询:

http://my_ip_address:8983/solr/search.form/select?wt=json&indent=true&fl=nama&q=nama_copy:batamindo\ v

我得到了很好的结果

{
  "responseHeader":{
    "status":0,
    "QTime":8},
  "response":{"numFound":579,"start":0,"docs":[
      {
        "nama":"BATAMINDO V "},
      {
        "nama":"BATAMINDO V"},
      {
        "nama":"BATAMINDO V"},
      {
        "nama":"BATAMINDO V"},
      {
        "nama":"BATAMINDO V"},
      {
        "nama":"BATAMINDO V"},
      {
        "nama":"BATAMINDO V"},
      {
        "nama":"BATAMINDO V"},
      {
        "nama":"BATAMINDO V"},
      {
        "nama":"BATAMINDO V"}]
  }}

但当我跑的时候

http://my_ip_address:8983/solr/search.form/select?wt=json&indent=true&fl=nama&q=nama_copy:batamindo\ vi

我的搜索结果很差

{
  "responseHeader":{
    "status":0,
    "QTime":14},
  "response":{"numFound":602,"start":0,"docs":[
      {
        "nama":"MV. VINCA"},
      {
        "nama":"MV. VINASHIP PEARL"},
      {
        "nama":"MV. VINASHIP PEARL"},
      {
        "nama":"MV. VINCENT TRADER"},
      {
        "nama":"MV. MEGHNA VICTORY"},
      {
        "nama":"MV. MEGHNA VICTORY"},
      {
        "nama":"NAVI SUNNY"},
      {
        "nama":"MV. MEGHNA VICTORY"},
      {
        "nama":"MT. GOLDEN VIOLET"},
      {
        "nama":"MT. GOLDEN VIOLET"}]
  }}

这是怎么回事?

kr98yfug

kr98yfug1#

你看到的是预期的行为。
NGramFilterFactory类将字符串标记为N大小的克。在您的示例中,字符串根据您的模式定义被分解为2到15个字符的字符串:

<filter class="solr.NGramFilterFactory" maxGramSize="15" minGramSize="2"/>

对于像cassandra这样的输入字符串,N-gram过滤器生成以下gram:

  • 尺寸=2:ca as ss sa an nd dr ra
  • 尺寸=3:cas ass ssa san and ndr dra
  • 大小=4:cass assa ssan sand andr ndra
  • 依此类推,直到size=15

对于搜索词ss,Solr查询将获得ssassssaassassan等的匹配。
在搜索词为vi的情况下,预计将匹配vincavinashipvincentvictorynaviviolet等。
有关详细信息,请参阅Solr中的文档分析。干杯!干杯!

相关问题