此问题在此处已有答案:
Solr wildcard search incorrect result(4个答案)
13天前关闭。
我希望你能帮助我,因为这个问题快把我逼疯了。
为了简单起见,我有一些文档,其中包含名为name_text_de_de的字段,其内容如下:
name_text_de_de
Industrie-Reiniger
Katalysator-Reiniger
Flächenreiniger
UNIVERSALREINIGER
FELGENREINIGER-GEL
这不是全部,而是其中的一部分。如果我使用这个查询,我会得到上面这些结果:q=name_text_de_de:*reinig
,但如果使用以下查询则没有结果:q=name_text_de_de:*reiniger
,这根本没有意义。
这里会有什么问题呢?
先谢谢你,
菲德
<fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory" />
<!-- <filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="lang/dictionary_de_de.txt" /> -->
<filter class="solr.ManagedStopFilterFactory" managed="de" />
<!-- <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de_de.txt" /> -->
<filter class="solr.LowerCaseFilterFactory" />
<!-- <filter class="solr.KeywordRepeatFilterFactory" /> -->
<filter class="solr.KeywordMarkerFilterFactory" protected="lang/protwords_de_de.txt" />
<!-- <filter class="solr.SnowballPorterFilterFactory" language="German" /> -->
<!-- <filter class="solr.SnowballPorterFilterFactory" language="German2" /> -->
<!-- <filter class="solr.GermanStemFilterFactory" /> -->
<!-- <filter class="solr.GermanLightStemFilterFactory" /> -->
<filter class="solr.GermanMinimalStemFilterFactory" />
<!-- <filter class="solr.GermanNormalizationFilterFactory" /> -->
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory" />
<!-- <filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="lang/dictionary_de_de.txt" /> -->
<filter class="solr.ManagedSynonymGraphFilterFactory" managed="de_de" />
<filter class="solr.ManagedStopFilterFactory" managed="de_de" />
<!-- <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de_de.txt" /> -->
<filter class="solr.LowerCaseFilterFactory" />
<!-- <filter class="solr.KeywordRepeatFilterFactory" /> -->
<filter class="solr.KeywordMarkerFilterFactory" protected="lang/protwords_de_de.txt" />
<!-- <filter class="solr.SnowballPorterFilterFactory" language="German" /> -->
<!-- <filter class="solr.SnowballPorterFilterFactory" language="German2" /> -->
<!-- <filter class="solr.GermanStemFilterFactory" /> -->
<!-- <filter class="solr.GermanLightStemFilterFactory" /> -->
<filter class="solr.GermanMinimalStemFilterFactory" />
<!-- <filter class="solr.GermanNormalizationFilterFactory" /> -->
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
</fieldType>
<fieldType name="text_de_de" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory" />
<!-- <filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="lang/dictionary_de_de.txt" /> -->
<filter class="solr.ManagedStopFilterFactory" managed="de_de" />
<!-- <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de_de.txt" /> -->
<filter class="solr.LowerCaseFilterFactory" />
<!-- <filter class="solr.KeywordRepeatFilterFactory" /> -->
<filter class="solr.KeywordMarkerFilterFactory" protected="lang/protwords_de_de.txt" />
<!-- <filter class="solr.SnowballPorterFilterFactory" language="German" /> -->
<!-- <filter class="solr.SnowballPorterFilterFactory" language="German2" /> -->
<!-- <filter class="solr.GermanStemFilterFactory" /> -->
<!-- <filter class="solr.GermanLightStemFilterFactory" /> -->
<filter class="solr.GermanMinimalStemFilterFactory" />
<!-- <filter class="solr.GermanNormalizationFilterFactory" /> -->
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory" />
<!-- <filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="lang/dictionary_de_de.txt" /> -->
<filter class="solr.ManagedSynonymGraphFilterFactory" managed="de_de" />
<filter class="solr.ManagedStopFilterFactory" managed="de_de" />
<!-- <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de_de.txt" /> -->
<filter class="solr.LowerCaseFilterFactory" />
<!-- <filter class="solr.KeywordRepeatFilterFactory" /> -->
<filter class="solr.KeywordMarkerFilterFactory" protected="lang/protwords_de_de.txt" />
<!-- <filter class="solr.SnowballPorterFilterFactory" language="German" /> -->
<!-- <filter class="solr.SnowballPorterFilterFactory" language="German2" /> -->
<!-- <filter class="solr.GermanStemFilterFactory" /> -->
<!-- <filter class="solr.GermanLightStemFilterFactory" /> -->
<filter class="solr.GermanMinimalStemFilterFactory" />
<!-- <filter class="solr.GermanNormalizationFilterFactory" /> -->
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
</fieldType>
<fieldType name="text_spell_de" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.ManagedStopFilterFactory" managed="de_de" />
<!-- <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de_de.txt" /> -->
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.ManagedSynonymGraphFilterFactory" managed="de_de" />
<filter class="solr.ManagedStopFilterFactory" managed="de_de" />
<!-- <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de_de.txt" /> -->
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
</fieldType>
<fieldType name="text_spell_de_de" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.ManagedStopFilterFactory" managed="de_de" />
<!-- <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de_de.txt" /> -->
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.ManagedSynonymGraphFilterFactory" managed="de_de" />
<filter class="solr.ManagedStopFilterFactory" managed="de_de" />
<!-- <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de_de.txt" /> -->
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
</fieldType>
1条答案
按热度按时间drnojrws1#
问题是***通配符查询不通过分析链***进行处理,因此您的查询不会作为原始文本进行词干分析。
例如,标记
reiniger
在索引时被词干过滤器截断为reinig
,它不能匹配*reiniger
(未过滤),因为索引中没有以“reiniger”结尾的标记。要使通配符查询和模糊搜索与词干分析器(以及其他可能截断标记的过滤器)一起正常工作,您需要在分析链中的词干分析器之前添加KeywordRepeatFilterFactory:
将每个令牌发出两次,一次带有KEYWORD属性,一次不带有。
如果放在词干分析器之前,结果是未词干化的标记将保留在与词干化标记相同的位置。与原始精确术语匹配的查询将获得更好的得分,同时仍保持词干化的召回优势。保留原始标记的另一个好处是通配符截断将按预期工作。