如何使较短(较近)的标记匹配更相关(边缘图)

gz5pxeao 于 2021-06-14 发布在 ElasticSearch

关注(0)|答案(1)|浏览(392)

我用edge\ngram标记器在自动完成时得到了奇怪的结果。我正在想办法让我的结果更有意义。我从elasticsearch文档中复制了这个示例。
我有以下描述的文件：
苹果、生的、无皮的
苹果、生的、金黄可口、带皮
“苹果饼，辣椒”
婴儿食品、水果、苹果酱、幼儿食品
如果我搜索 apple “苹果，辣椒”比“苹果，生的，没有皮的”得分更高
如果我搜索 apples “婴儿食品、水果、苹果酱、初级”的得分将高于“苹果、生的、金黄可口、带皮”
在这两种情况下，我希望有更高的分数为更相关的密切/较短的匹配（即当我搜索 apple 或者 apples ，包含单词的结果 apples 得分应高于 APPLEBEE'S 或者 applesauce .
我的设置是：

{
  "settings": {
    "analysis": {
      "analyzer": {
        "autocomplete": {
          "tokenizer": "autocomplete",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        },
        "autocomplete_search": {
          "tokenizer": "lowercase"
        }
      },
      "tokenizer": {
        "autocomplete": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 20,
          "token_chars": [
            "letter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "description": {
        "type": "text",
        "analyzer": "autocomplete",
        "search_analyzer": "autocomplete_search"
      }
    }
  }
}

查询：

"query": {
    "match": {
      "description": {
          "query": "apple", 
          "operator": "and"
        }
    }
}

来源：https://stackoverflow.com/questions/64530450/how-to-make-shorter-closer-token-match-more-relevant-edge-ngram

1条答案

按热度按时间

pb3skfrl1#

这个问题是由于在新的bm25算法（用于评分）中称为（dl）的匹配字段的长度造成的，您可以很容易地在查询中使用explain参数来详细了解它
http://{{hostname}}:{{port}}//\u search？explain=true
作为你的 APPLEBEE'S, chili 是最短的长度它得到更多的分数，这是这个文件的tf分数

{
                                    "value": 0.5344296,
                                    "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                                    "details": [
                                        {
                                            "value": 1.0,
                                            "description": "freq, occurrences of term within document",
                                            "details": []
                                        },
                                        {
                                            "value": 1.2,
                                            "description": "k1, term saturation parameter",
                                            "details": []
                                        },
                                        {
                                            "value": 0.75,
                                            "description": "b, length normalization parameter",
                                            "details": []
                                        },
                                        {
                                            "value": 11.0,
                                            "description": "dl, length of field", ---> note this
                                            "details": []
                                        },
                                        {
                                            "value": 17.333334,
                                            "description": "avgdl, average length of field",
                                            "details": []
                                        }
                                    ]
                                }

解决方案
您需要创建另一个使用 english 分析器如多字段示例所示，下面是完整的示例
索引示例

{
    "settings": {
        "analysis": {
            "analyzer": {
                "autocomplete": {
                    "tokenizer": "autocomplete",
                    "filter": [
                        "lowercase",
                        "asciifolding"
                    ]
                },
                "autocomplete_search": {
                    "tokenizer": "lowercase"
                }
            },
            "tokenizer": {
                "autocomplete": {
                    "type": "edge_ngram",
                    "min_gram": 2,
                    "max_gram": 20,
                    "token_chars": [
                        "letter"
                    ]
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "name": {
                "type": "text",
                "analyzer": "autocomplete",
                "search_analyzer": "autocomplete_search",
                "fields": {
                    "english": {
                        "type": "text",
                        "analyzer": "english"
                    }
                }
            }
        }
    }
}
}

索引你的样本文件

{
    "name" : "Apples, raw, without skin"
}
{
    "name" : "APPLEBEE'S, chili"
}
{
    "name" : "Babyfood, fruit, applesauce, junior"
}
{
    "name" : "Apples, raw, golden delicious, with skin"
}

和搜索查询

{
    "query": {
        "bool": {
            "should": [
                {
                    "multi_match": {
                        "query": "apple",
                        "fields": [
                            "name.english",
                            "name"
                        ]
                    }
                }
            ]
        }
    }
}

和搜索结果，请注意包含 apple ```
"hits": [
{
"_index": "edgelow",
"_type": "_doc",
"_id": "1",
"_score": 0.6747451,
"_source": {
"name": "Apples, raw, without skin"
}
},
{
"_index": "edgelow",
"_type": "_doc",
"_id": "4",
"_score": 0.60996956,
"_source": {
"name": "Apples, raw, golden delicious, with skin"
}
},
{
"_index": "edgelow",
"_type": "_doc",
"_id": "2",
"_score": 0.12822598,
"_source": {
"name": "APPLEBEE'S, chili"
}
},
{
"_index": "edgelow",
"_type": "_doc",
"_id": "3",
"_score": 0.09446116,
"_source": {
"name": "Babyfood, fruit, applesauce, junior"
}
}
]

赞(0）回复(0）举报 2021-06-14

我来回答

如何使较短(较近)的标记匹配更相关(边缘图)

1条答案

相关问题

热门标签

最新问答