Solr未提供现有结果[重复]

93ze6v8z  于 2022-11-23  发布在  Solr
关注(0)|答案(1)|浏览(149)

此问题在此处已有答案

Solr wildcard search incorrect result(4个答案)
13天前关闭。
我希望你能帮助我,因为这个问题快把我逼疯了。
为了简单起见,我有一些文档,其中包含名为name_text_de_de的字段,其内容如下:

name_text_de_de
Industrie-Reiniger
Katalysator-Reiniger
Flächenreiniger
UNIVERSALREINIGER
FELGENREINIGER-GEL

这不是全部,而是其中的一部分。如果我使用这个查询,我会得到上面这些结果:q=name_text_de_de:*reinig,但如果使用以下查询则没有结果:q=name_text_de_de:*reiniger,这根本没有意义。
这里会有什么问题呢?
先谢谢你,
菲德

<fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.StandardTokenizerFactory" />
                <!-- <filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="lang/dictionary_de_de.txt" /> -->
                <filter class="solr.ManagedStopFilterFactory" managed="de" />
                <!-- <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de_de.txt" /> -->
                <filter class="solr.LowerCaseFilterFactory" />
                <!-- <filter class="solr.KeywordRepeatFilterFactory" /> -->
                <filter class="solr.KeywordMarkerFilterFactory" protected="lang/protwords_de_de.txt" />
                <!-- <filter class="solr.SnowballPorterFilterFactory" language="German" /> -->
                <!-- <filter class="solr.SnowballPorterFilterFactory" language="German2" /> -->
                <!-- <filter class="solr.GermanStemFilterFactory" /> -->
                <!-- <filter class="solr.GermanLightStemFilterFactory" /> -->
                <filter class="solr.GermanMinimalStemFilterFactory" />
                <!-- <filter class="solr.GermanNormalizationFilterFactory" /> -->
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.StandardTokenizerFactory" />
                <!-- <filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="lang/dictionary_de_de.txt" /> -->
                <filter class="solr.ManagedSynonymGraphFilterFactory" managed="de_de" />
                <filter class="solr.ManagedStopFilterFactory" managed="de_de" />
                <!-- <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de_de.txt" /> -->
                <filter class="solr.LowerCaseFilterFactory" />
                <!-- <filter class="solr.KeywordRepeatFilterFactory" /> -->
                <filter class="solr.KeywordMarkerFilterFactory" protected="lang/protwords_de_de.txt" />
                <!-- <filter class="solr.SnowballPorterFilterFactory" language="German" /> -->
                <!-- <filter class="solr.SnowballPorterFilterFactory" language="German2" /> -->
                <!-- <filter class="solr.GermanStemFilterFactory" /> -->
                <!-- <filter class="solr.GermanLightStemFilterFactory" /> -->
                <filter class="solr.GermanMinimalStemFilterFactory" />
                <!-- <filter class="solr.GermanNormalizationFilterFactory" /> -->
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
            </analyzer>
        </fieldType>

        <fieldType name="text_de_de" class="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.StandardTokenizerFactory" />
                <!-- <filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="lang/dictionary_de_de.txt" /> -->
                <filter class="solr.ManagedStopFilterFactory" managed="de_de" />
                <!-- <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de_de.txt" /> -->
                <filter class="solr.LowerCaseFilterFactory" />
                <!-- <filter class="solr.KeywordRepeatFilterFactory" /> -->
                <filter class="solr.KeywordMarkerFilterFactory" protected="lang/protwords_de_de.txt" />
                <!-- <filter class="solr.SnowballPorterFilterFactory" language="German" /> -->
                <!-- <filter class="solr.SnowballPorterFilterFactory" language="German2" /> -->
                <!-- <filter class="solr.GermanStemFilterFactory" /> -->
                <!-- <filter class="solr.GermanLightStemFilterFactory" /> -->
                <filter class="solr.GermanMinimalStemFilterFactory" />
                <!-- <filter class="solr.GermanNormalizationFilterFactory" /> -->
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.StandardTokenizerFactory" />
                <!-- <filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="lang/dictionary_de_de.txt" /> -->
                <filter class="solr.ManagedSynonymGraphFilterFactory" managed="de_de" />
                <filter class="solr.ManagedStopFilterFactory" managed="de_de" />
                <!-- <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de_de.txt" /> -->
                <filter class="solr.LowerCaseFilterFactory" />
                <!-- <filter class="solr.KeywordRepeatFilterFactory" /> -->
                <filter class="solr.KeywordMarkerFilterFactory" protected="lang/protwords_de_de.txt" />
                <!-- <filter class="solr.SnowballPorterFilterFactory" language="German" /> -->
                <!-- <filter class="solr.SnowballPorterFilterFactory" language="German2" /> -->
                <!-- <filter class="solr.GermanStemFilterFactory" /> -->
                <!-- <filter class="solr.GermanLightStemFilterFactory" /> -->
                <filter class="solr.GermanMinimalStemFilterFactory" />
                <!-- <filter class="solr.GermanNormalizationFilterFactory" /> -->
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
            </analyzer>
        </fieldType>

        <fieldType name="text_spell_de" class="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.StandardTokenizerFactory" />
                <filter class="solr.ManagedStopFilterFactory" managed="de_de" />
                <!-- <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de_de.txt" /> -->
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.StandardTokenizerFactory" />
                <filter class="solr.ManagedSynonymGraphFilterFactory" managed="de_de" />
                <filter class="solr.ManagedStopFilterFactory" managed="de_de" />
                <!-- <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de_de.txt" /> -->
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
            </analyzer>
        </fieldType>

        <fieldType name="text_spell_de_de" class="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.StandardTokenizerFactory" />
                <filter class="solr.ManagedStopFilterFactory" managed="de_de" />
                <!-- <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de_de.txt" /> -->
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.StandardTokenizerFactory" />
                <filter class="solr.ManagedSynonymGraphFilterFactory" managed="de_de" />
                <filter class="solr.ManagedStopFilterFactory" managed="de_de" />
                <!-- <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de_de.txt" /> -->
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
            </analyzer>
        </fieldType>
drnojrws

drnojrws1#

问题是***通配符查询不通过分析链***进行处理,因此您的查询不会作为原始文本进行词干分析。
例如,标记reiniger在索引时被词干过滤器截断为reinig,它不能匹配*reiniger(未过滤),因为索引中没有以“reiniger”结尾的标记。

Input stream            |  Indexed tokens
-------------------------|--------------------------
 "Industrie-Reiniger"    |  "industri", "reinig"
 "Katalysator-Reiniger"  |  "katalysato", "reinig"
 "Flächenreiniger"       |  "flachenreinig"
 "UNIVERSALREINIGER"     |  "universalreinig"
 "FELGENREINIGER-GEL"    |  "felgenreinig", "gel"

要使通配符查询和模糊搜索与词干分析器(以及其他可能截断标记的过滤器)一起正常工作,您需要在分析链中的词干分析器之前添加KeywordRepeatFilterFactory:
将每个令牌发出两次,一次带有KEYWORD属性,一次不带有。
如果放在词干分析器之前,结果是未词干化的标记将保留在与词干化标记相同的位置。与原始精确术语匹配的查询将获得更好的得分,同时仍保持词干化的召回优势。保留原始标记的另一个好处是通配符截断将按预期工作。

相关问题