lucene Elasticsearch只显示带有.raw特殊字符的匹配项

gmxoilav 于 2022-11-07 发布在 Lucene

关注(0)|答案(1)|浏览(197)

几天前我开始使用Elasticsearch，我创建了一些分析器和Map，并成功地在其中插入了一些数据。当我试图查询包含一些特殊字符的数据时，问题发生了。最初我使用standard分析器，但在阅读了更多选项后，我决定使用whitespace，因为它也可以标记特殊字符。然而，我仍然无法查询数据。但是，如果我输入field.raw（其中field是对象的实际属性），我会得到我需要的结果。但是，.raw绕过了分析器，我想知道它是否会破坏它的目的。由于空格对我不起作用，我恢复到了standard。
这是我构建的分析器：

PUT demoindex
{
  "settings": {
    "analysis": {
      "filter": {
        "ngram": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 20,
          "token_chars": [
            "letter",
            "digit"
          ]
        },
        "splcharfilter": {
          "type": "pattern_capture",
          "preserve_original": true,
          "patterns": [
            "([?/-])"
          ]
        }
      },
      "analyzer": {
        "my_field_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "ngram",
            "splcharfilter"
          ]
        }
      }
    }
  }
}

我构建的Map：

PUT demoindex/_mapping
{
  "properties": {
    "name": {
      "type": "text",
      "analyzer": "my_field_analyzer",
      "search_analyzer": "simple",
      "fields": {
        "raw": {
          "type": "keyword"
        }
      }
    },
    "area": {
      "type": "text",
      "analyzer": "my_field_analyzer",
      "search_analyzer": "simple",
      "fields": {
        "raw": {
          "type": "keyword"
        }
      }
    }
  }
}

不起作用的查询：

GET /demoindex/_search?pretty
{
  "from": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "area": {
              "value": "is - application"
            }
          }
        },
        {
          "term": {
            "name": {
              "value": "hem"
            }
          }
        }
      ]
    }
  },
  "size": 15
}

有效的质询：

GET /demoindex/_search?pretty
{
  "from": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "area.raw": {
              "value": "is - application"
            }
          }
        },
        {
          "term": {
            "name": {
              "value": "hem"
            }
          }
        }
      ]
    }
  },
  "size": 15
}

正如您所注意到的，我不得不使用area.raw来匹配内容并返回文档。由于name不应包含任何特殊字符，因此没有.raw应该是可以的，但其他字段将包含特殊字符，这些字符可能不限于-。
那么，有人能指出我做错了什么，或者我理解错了什么吗？或者有没有更好的方法来实现这一点？
P.S：版本信息
ElasticSearch：7.10.1
Lucene版本：8.7.0

lucene

来源：https://stackoverflow.com/questions/71217647/elasticsearch-shows-match-with-special-character-with-only-raw

1条答案

按热度按时间

hgb9j2n61#

1.不分析关键字字段。
1.文本字段进行分析。
要检查这些是如何分析的，以及生成了哪些令牌，可以使用Elasticsearch中的“Analyze API”。
在您的情况下：

POST demoindex/_analyze
{
  "text": ["is - application"],
  "field": "area"
}

它会输出

{
  "tokens" : [
    {
      "token" : "i"
    },
    {
      "token" : "is"
    },
    {
      "token" : "a"
    },
    {
      "token" : "ap"
    },
    {
      "token" : "app"
    },
    {
      "token" : "appl"
    },
    {
      "token" : "appli"
    },
    {
      "token" : "applic"
    },
    {
      "token" : "applica"
    },
    {
      "token" : "applicat"
    },
    {
      "token" : "applicati"
    },
    {
      "token" : "applicatio"
    },
    {
      "token" : "application"
    }
  ]
}

因此，当您提供值area.raw：“is - application”作为其关键字类型时，它将按原样保存，因此您的下面的术语查询有效。
术语查询用于精确匹配，应与未分析的字段一起使用，如area.raw，在您的情况下它是关键字。

GET /demoindex/_search?pretty
{
  "from": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "area.raw": {
              "value": "is - application"
            }
          }
        }
      ]
    }
  },
  "size": 15
}

但是，当您对文本字段应用相同的Term查询时，它并不工作，因为它试图完全匹配所提供的值，但正如我们在上面看到的，区域值已被标记化，
因此，正如Elasticsearch建议的那样，最好使用“匹配”查询来查找文本（分析字段）。

GET /demoindex/_search?pretty
{
  "from": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "match": {
            "area": {
              "query": "is - application"
            }
          }
        }
      ]
    }
  },
  "size": 15
}

赞(0）回复(0）举报 2022-11-07

我来回答

lucene Elasticsearch只显示带有.raw特殊字符的匹配项

1条答案

相关问题

热门标签

最新问答