ElasticSearch:按字段值编号处理分数

p4tfgftt  于 2022-11-02  发布在  ElasticSearch
关注(0)|答案(1)|浏览(130)

我使用Elastic来搜索pdf文件。pdf文件内容的其中一个字段是doridat,它是一个整数形式的日期。最新的文档应该得到更高的分数(更高的排名)。这意味着doridat字段的值越高,分数应该越高。只有在attachment.content和doridat中搜索的结果会影响分数。

如何强制计分以整合字段(doridat)值?

我的疑问:

GET /attachments/_search 
{  
  "size": 2,
  "from": 0,
  "query": {
    "wildcard": {
      "attachment.content": {
        "value": "*berg*",
        "rewrite": "scoring_boolean"
      }
    }
  },
  "highlight":{
    "fields":{
      "attachment.content":{}
    }
  },
  "_source": {
    "excludes": "attachment.content"
  }
}

我的Map:

{
  "attachments" : {
    "mappings" : {
      "properties" : {
        "attachment" : {
          "properties" : {
            "author" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "content" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "content_length" : {
              "type" : "long"
            },
            "content_type" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "creator_tool" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "date" : {
              "type" : "date"
            },
            "description" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "detect_language" : {
              "type" : "boolean"
            },
            "format" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "indexed_chars" : {
              "type" : "long"
            },
            "keywords" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "language" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "metadata_date" : {
              "type" : "date"
            },
            "modified" : {
              "type" : "date"
            },
            "name" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "title" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        },
        "content" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "daname" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "do__nr" : {
          "type" : "integer"
        },
        "do_typ" : {
          "type" : "integer"
        },
        "doext" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "doname" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "donr" : {
          "type" : "integer"
        },
        "doridat" : {
          "type" : "integer"
        },
        "dowww" : {
          "type" : "integer"
        },
        "id" : {
          "type" : "integer"
        },
        "path" : {
          "type" : "text",
          "analyzer" : "windows_path_hierarchy_analyzer"
        }
      }
    }
  }
}
9rbhqvlz

9rbhqvlz1#

我认为通配符总是返回1.0作为匹配项(即使匹配了不止一次)。
Rank feature看起来很适合您的用例。您需要复制doridat字段,并使用rank_feature字段类型对其进行索引。您将能够在Rank feature query中使用该字段。您使用的是哪个Elasticsearch版本?
另一个选择是使用Script score query。您基本上可以在脚本中返回doridat,因为wildcard总是返回1.0作为score。您可以使用N-gram tokenizer作为attachment.content,以实现类似于通配符的查询。当您使用match而不是wildcard时,它将对匹配项进行更好的评分。
文档声明排名功能具有更好的性能(在搜索时可以跳过文档)。

相关问题