ElasticSearch：按字段值编号处理分数

我使用Elastic来搜索pdf文件。pdf文件内容的其中一个字段是doridat，它是一个整数形式的日期。最新的文档应该得到更高的分数（更高的排名）。这意味着doridat字段的值越高，分数应该越高。只有在attachment.content和doridat中搜索的结果会影响分数。

如何强制计分以整合字段（doridat）值？

我的疑问：

GET /attachments/_search 
{  
  "size": 2,
  "from": 0,
  "query": {
    "wildcard": {
      "attachment.content": {
        "value": "*berg*",
        "rewrite": "scoring_boolean"
      }
    }
  },
  "highlight":{
    "fields":{
      "attachment.content":{}
    }
  },
  "_source": {
    "excludes": "attachment.content"
  }
}

我的Map：

{
  "attachments" : {
    "mappings" : {
      "properties" : {
        "attachment" : {
          "properties" : {
            "author" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "content" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "content_length" : {
              "type" : "long"
            },
            "content_type" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "creator_tool" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "date" : {
              "type" : "date"
            },
            "description" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "detect_language" : {
              "type" : "boolean"
            },
            "format" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "indexed_chars" : {
              "type" : "long"
            },
            "keywords" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "language" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "metadata_date" : {
              "type" : "date"
            },
            "modified" : {
              "type" : "date"
            },
            "name" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "title" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        },
        "content" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "daname" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "do__nr" : {
          "type" : "integer"
        },
        "do_typ" : {
          "type" : "integer"
        },
        "doext" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "doname" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "donr" : {
          "type" : "integer"
        },
        "doridat" : {
          "type" : "integer"
        },
        "dowww" : {
          "type" : "integer"
        },
        "id" : {
          "type" : "integer"
        },
        "path" : {
          "type" : "text",
          "analyzer" : "windows_path_hierarchy_analyzer"
        }
      }
    }
  }
}

我认为通配符总是返回1.0作为匹配项（即使匹配了不止一次）。
Rank feature看起来很适合您的用例。您需要复制doridat字段，并使用rank_feature字段类型对其进行索引。您将能够在Rank feature query中使用该字段。您使用的是哪个Elasticsearch版本？
另一个选择是使用Script score query。您基本上可以在脚本中返回doridat，因为wildcard总是返回1.0作为score。您可以使用N-gram tokenizer作为attachment.content，以实现类似于通配符的查询。当您使用match而不是wildcard时，它将对匹配项进行更好的评分。
文档声明排名功能具有更好的性能（在搜索时可以跳过文档）。

ElasticSearch：按字段值编号处理分数

1条答案

相关问题

热门标签

最新问答