我用的是Datastax 6.8。这是我的SOLR模式:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<schema name="autoSolrSchema" version="1.5">
<types>
<fieldType class="org.apache.solr.schema.StrField" name="StrField"/>
<fieldType class="org.apache.solr.schema.TextField" name="NameField">
<analyzer type="index">
<filter class="solr.ASCIIFoldingFilterFactory"/>
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.NGramFilterFactory" maxGramSize="15" minGramSize="2"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.NGramFilterFactory" maxGramSize="15" minGramSize="2"/>
</analyzer>
</fieldType>
</types>
<fields>
<field indexed="true" multiValued="false" name="nama" type="StrField"/>
<field indexed="true" multiValued="false" name="nama_copy" type="NameField"/>
</fields>
<uniqueKey>(nama)</uniqueKey>
<copyField dest="nama_copy" source="nama"/>
</schema>
I have this field value in a row我有这个字段值在一行
然后我运行了这个查询:
http://my_ip_address:8983/solr/search.form/select?wt=json&indent=true&fl=nama&q=nama_copy:batamindo\ v
我得到了很好的结果
{
"responseHeader":{
"status":0,
"QTime":8},
"response":{"numFound":579,"start":0,"docs":[
{
"nama":"BATAMINDO V "},
{
"nama":"BATAMINDO V"},
{
"nama":"BATAMINDO V"},
{
"nama":"BATAMINDO V"},
{
"nama":"BATAMINDO V"},
{
"nama":"BATAMINDO V"},
{
"nama":"BATAMINDO V"},
{
"nama":"BATAMINDO V"},
{
"nama":"BATAMINDO V"},
{
"nama":"BATAMINDO V"}]
}}
但当我跑的时候
http://my_ip_address:8983/solr/search.form/select?wt=json&indent=true&fl=nama&q=nama_copy:batamindo\ vi
我的搜索结果很差
{
"responseHeader":{
"status":0,
"QTime":14},
"response":{"numFound":602,"start":0,"docs":[
{
"nama":"MV. VINCA"},
{
"nama":"MV. VINASHIP PEARL"},
{
"nama":"MV. VINASHIP PEARL"},
{
"nama":"MV. VINCENT TRADER"},
{
"nama":"MV. MEGHNA VICTORY"},
{
"nama":"MV. MEGHNA VICTORY"},
{
"nama":"NAVI SUNNY"},
{
"nama":"MV. MEGHNA VICTORY"},
{
"nama":"MT. GOLDEN VIOLET"},
{
"nama":"MT. GOLDEN VIOLET"}]
}}
这是怎么回事?
1条答案
按热度按时间kr98yfug1#
你看到的是预期的行为。
NGramFilterFactory
类将字符串标记为N大小的克。在您的示例中,字符串根据您的模式定义被分解为2到15个字符的字符串:对于像
cassandra
这样的输入字符串,N-gram过滤器生成以下gram:ca as ss sa an nd dr ra
cas ass ssa san and ndr dra
cass assa ssan sand andr ndra
对于搜索词
ss
,Solr查询将获得ss
、ass
、ssa
、assa
、ssan
等的匹配。在搜索词为
vi
的情况下,预计将匹配vinca
,vinaship
,vincent
,victory
,navi
,violet
等。有关详细信息,请参阅Solr中的文档分析。干杯!干杯!