如何使用dsl查询丢弃elasticsearch中的重复值？

2guxujil 于 2021-07-15 发布在 ElasticSearch

关注(0)|答案(1)|浏览(440)

这里我想在查询客户的基础上得到属性名。这里的问题是属性名中有很多重复的值，我想丢弃，有人能帮我吗

{
  "_source": [
    "attribute_name"
  ],
  "size": 500, 
  "query": {
    "multi_match": {
      "query": "CUSTOMER",
      "fields": [
        "hierarchy_name",
        "attribute_name"
      ]
    }
  }
}

假设这是我的输出，这里我想放弃重复的属性名称

{
        "_index": "planlytx_records",
        "_type": "_doc",
        "_id": "tD6WDnkBQTXQIneq8Ypr",
        "_score": 2.5454113,
        "_source": {
          "attribute_name": "CUSTOMER"
        }
      },
      {
        "_index": "planlytx_records",
        "_type": "_doc",
        "_id": "3j6WDnkBQTXQIneq8Yps",
        "_score": 2.5454113,
        "_source": {
          "attribute_name": "CUSTOMER"
        }
      },
      {
        "_index": "planlytx_records",
        "_type": "_doc",
        "_id": "nT6WDnkBQTXQIneqyonu",
        "_score": 1.8101583,
        "_source": {
          "attribute_name": "REGION"
        }
      },
      {
        "_index": "planlytx_records",
        "_type": "_doc",
        "_id": "6D6WDnkBQTXQIneq8Yps",
        "_score": 1.8101583,
        "_source": {
          "attribute_name": "REGION"
        }
      },

我的输出应该是这样的。。

{
        "_index": "planlytx_records",
        "_type": "_doc",
        "_id": "3j6WDnkBQTXQIneq8Yps",
        "_score": 2.5454113,
        "_source": {
          "attribute_name": "CUSTOMER"
        }
      },
      {
        "_index": "planlytx_records",
        "_type": "_doc",
        "_id": "nT6WDnkBQTXQIneqyonu",
        "_score": 1.8101583,
        "_source": {
          "attribute_name": "REGION"
        }
      },

elasticsearch logstash kibana elasticsearch-5 elasticsearch-dsl

来源：https://stackoverflow.com/questions/67276433/how-to-discard-the-duplicate-values-in-elasticsearch-using-dsl-query

1条答案

按热度按时间

vjrehmav1#

你可以用 collapse 参数，以根据字段值从搜索结果中删除重复项
添加索引数据、Map、搜索查询和搜索结果的工作示例
索引Map：

{
  "mappings": {
    "properties": {
      "attribute_name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

索引数据：

{
  "attribute_name": "CUSTOMER-ALL"
}
{
  "attribute_name": "CUSTOMER-ALL"
}
{
  "attribute_name": "CUSTOMER"
}
{
  "attribute_name": "CUSTOMER"
}

搜索查询：

{
  "query": {
    "multi_match": {
      "query": "CUSTOMER",
      "fields": [
        "attribute_name"
      ]
    }
  },
  "collapse": {
    "field": "attribute_name.keyword"
  }
}

搜索结果：

"hits": [
      {
        "_index": "67260491",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.12199639,
        "_source": {
          "attribute_name": "CUSTOMER"
        },
        "fields": {
          "attribute_name.keyword": [
            "CUSTOMER"
          ]
        }
      },
      {
        "_index": "67260491",
        "_type": "_doc",
        "_id": "3",
        "_score": 0.09271726,
        "_source": {
          "attribute_name": "CUSTOMER-ALL"
        },
        "fields": {
          "attribute_name.keyword": [
            "CUSTOMER-ALL"
          ]
        }
      }
    ]

更新1：
如果您只想删除重复的数据，可以运行下面的查询

{
  "collapse": {
    "field": "attribute_name.keyword"
  }
}

搜索结果将是

"hits": [
      {
        "_index": "67276433",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "attribute_name": "CUSTOMER"
        },
        "fields": {
          "attribute_name.keyword": [
            "CUSTOMER"
          ]
        }
      },
      {
        "_index": "67276433",
        "_type": "_doc",
        "_id": "3",
        "_score": 1.0,
        "_source": {
          "attribute_name": "REGION"
        },
        "fields": {
          "attribute_name.keyword": [
            "REGION"
          ]
        }
      }
    ]

赞(0）回复(0）举报 2021-07-15

我来回答

如何使用dsl查询丢弃elasticsearch中的重复值？

1条答案

相关问题

热门标签

最新问答