从ElasticSearch中获取瓦片区结果

093gszye  于 2023-01-20  发布在  ElasticSearch
关注(0)|答案(1)|浏览(170)

我已经熟悉shingle分析器,并且能够创建一个带状疱疹分析器,如下所示:

"index": {
      "number_of_shards": 10,
      "number_of_replicas": 1
    },
    "analysis": {
      "analyzer": {
        "shingle_analyzer": {
          "filter": [
            "standard",
            "lowercase"
            "filter_shingle"
          ]
        }
      },
      "filter": {
        "filter_shingle": {
          "type": "shingle",
          "max_shingle_size": 2,
          "min_shingle_size": 2,
          "output_unigrams": false
        }
      }
    }
  }

然后我使用mapping中定义的分析器来分析文档中名为content的字段。问题是content字段是一个很长的文本,我想将其用作自动完成建议器的数据,所以我只需要匹配短语后面的一两个单词。我想知道是否有办法获得search(或suggestanalyze)API也会产生瓦片区。通过使用shingle analyzerelastic本身会将文本索引为瓦片区,是否有办法访问这些瓦片区?
例如,我传递的查询是:

GET the_index/_search
{
  "_source": ["content"],
  "explain": true, 

      "query" : {
        "match" : { "content.shngled_field": "news" }
      }
}

结果是:

{
  "took" : 395,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : 7.8647532,
    "hits" : [
      {
        "_shard" : "[v3_kavan_telegram_201911][0]",
        "_node" : "L6vHYla-TN6CHo2I6g4M_A",
        "_index" : "v3_kavan_telegram_201911",
        "_type" : "_doc",
        "_id" : "g1music/70733",
        "_score" : 7.8647532,
        "_source" : {
          "content" : "Find the latest breaking news and information on the top stories, weather, business, entertainment, politics, and more."
....
}

如您所见,结果包含整个content文件,这是一个非常长的文本。

"content" : "news and information on"

即匹配的木瓦本身。

lqfhib0f

lqfhib0f1#

创建索引并摄取文档后

PUT sh
{
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "fields": {
          "shingled": {
            "type": "text",
            "analyzer": "shingle_analyzer"
          }
        }
      }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "shingle_analyzer": {
          "type": "standard",
          "filter": [
            "standard",
            "lowercase",
            "filter_shingle"
          ]
        }
      },
      "filter": {
        "filter_shingle": {
          "type": "shingle",
          "max_shingle_size": 2,
          "min_shingle_size": 2,
          "output_unigrams": false
        }
      }
    }
  }
}

POST sh/_doc/1
{
  "content": "and then I use the defined analyzer in mapping for a field in my document named content.The problem is the content field is a very long text and I want to use it as data for a autocomplete suggester, so I just need one or two words that follow the matched phrase. I wonder if there is a way to get the search (or suggest or analyze) API result as shingles too. By using shingle analyzer the elastic itself indexes the text as shingles, is there a way to access those shingles?"
}

您可以调用_analyze w/相应的分析器来查看给定文本是如何被标记的:

GET sh/_analyze
{
  "text": "and then I use the defined analyzer in mapping for a field in my document named content.The problem is the content field is a very long text and I want to use it as data for a autocomplete suggester, so I just need one or two words that follow the matched phrase. I wonder if there is a way to get the search (or suggest or analyze) API result as shingles too. By using shingle analyzer the elastic itself indexes the text as shingles, is there a way to access those shingles?",
  "analyzer": "shingle_analyzer"
}

或者查看term vectors信息:

GET sh/_termvectors/1
{
  "fields" : ["content.shingled"],
  "offsets" : true,
  "payloads" : true,
  "positions" : true,
  "term_statistics" : true,
  "field_statistics" : true
}

你也要强调吗?

相关问题