如何使用dsl查询丢弃elasticsearch中的重复值?

2guxujil  于 2021-07-15  发布在  ElasticSearch
关注(0)|答案(1)|浏览(440)

这里我想在查询客户的基础上得到属性名。这里的问题是属性名中有很多重复的值,我想丢弃,有人能帮我吗

{
  "_source": [
    "attribute_name"
  ],
  "size": 500, 
  "query": {
    "multi_match": {
      "query": "CUSTOMER",
      "fields": [
        "hierarchy_name",
        "attribute_name"
      ]
    }
  }
}

假设这是我的输出,这里我想放弃重复的属性名称

{
        "_index": "planlytx_records",
        "_type": "_doc",
        "_id": "tD6WDnkBQTXQIneq8Ypr",
        "_score": 2.5454113,
        "_source": {
          "attribute_name": "CUSTOMER"
        }
      },
      {
        "_index": "planlytx_records",
        "_type": "_doc",
        "_id": "3j6WDnkBQTXQIneq8Yps",
        "_score": 2.5454113,
        "_source": {
          "attribute_name": "CUSTOMER"
        }
      },
      {
        "_index": "planlytx_records",
        "_type": "_doc",
        "_id": "nT6WDnkBQTXQIneqyonu",
        "_score": 1.8101583,
        "_source": {
          "attribute_name": "REGION"
        }
      },
      {
        "_index": "planlytx_records",
        "_type": "_doc",
        "_id": "6D6WDnkBQTXQIneq8Yps",
        "_score": 1.8101583,
        "_source": {
          "attribute_name": "REGION"
        }
      },

我的输出应该是这样的。。

{
        "_index": "planlytx_records",
        "_type": "_doc",
        "_id": "3j6WDnkBQTXQIneq8Yps",
        "_score": 2.5454113,
        "_source": {
          "attribute_name": "CUSTOMER"
        }
      },
      {
        "_index": "planlytx_records",
        "_type": "_doc",
        "_id": "nT6WDnkBQTXQIneqyonu",
        "_score": 1.8101583,
        "_source": {
          "attribute_name": "REGION"
        }
      },
vjrehmav

vjrehmav1#

你可以用 collapse 参数,以根据字段值从搜索结果中删除重复项
添加索引数据、Map、搜索查询和搜索结果的工作示例
索引Map:

{
  "mappings": {
    "properties": {
      "attribute_name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

索引数据:

{
  "attribute_name": "CUSTOMER-ALL"
}
{
  "attribute_name": "CUSTOMER-ALL"
}
{
  "attribute_name": "CUSTOMER"
}
{
  "attribute_name": "CUSTOMER"
}

搜索查询:

{
  "query": {
    "multi_match": {
      "query": "CUSTOMER",
      "fields": [
        "attribute_name"
      ]
    }
  },
  "collapse": {
    "field": "attribute_name.keyword"
  }
}

搜索结果:

"hits": [
      {
        "_index": "67260491",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.12199639,
        "_source": {
          "attribute_name": "CUSTOMER"
        },
        "fields": {
          "attribute_name.keyword": [
            "CUSTOMER"
          ]
        }
      },
      {
        "_index": "67260491",
        "_type": "_doc",
        "_id": "3",
        "_score": 0.09271726,
        "_source": {
          "attribute_name": "CUSTOMER-ALL"
        },
        "fields": {
          "attribute_name.keyword": [
            "CUSTOMER-ALL"
          ]
        }
      }
    ]

更新1:
如果您只想删除重复的数据,可以运行下面的查询

{
  "collapse": {
    "field": "attribute_name.keyword"
  }
}

搜索结果将是

"hits": [
      {
        "_index": "67276433",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "attribute_name": "CUSTOMER"
        },
        "fields": {
          "attribute_name.keyword": [
            "CUSTOMER"
          ]
        }
      },
      {
        "_index": "67276433",
        "_type": "_doc",
        "_id": "3",
        "_score": 1.0,
        "_source": {
          "attribute_name": "REGION"
        },
        "fields": {
          "attribute_name.keyword": [
            "REGION"
          ]
        }
      }
    ]

相关问题