用于筛选同时发生这两个事件的用户的ElasticSearch聚合

ghhkc1vu  于 2022-11-28  发布在  ElasticSearch
关注(0)|答案(1)|浏览(148)

我编写了一个查询,它可以完美地返回用户拥有6000多个事件:

GET /<app_logs-2022.11.23*>/_search
{
  "query": { 
    "bool": {
      "should": [
        {
          "term": {
            "context.identity.type": "login"
          }
        },
        {
          "term": {
            "context.identity.type": "login_error"
          }
        }
      ],
      "minimum_should_match": 1
    }
  },
  "_source": [
    "context.identity.user_id",
    "context.identity.type"
  ],
  "size": 3
}

我得到了这样一组数据

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 15,
    "successful" : 15,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 6001,
    "max_score" : 10.722837,
    "hits" : [
      {
        "_index" : "app_logs-2022.11.23-7",
        "_type" : "app",
        "_id" : "bb469377-0618-49a6-a643-1201dc84c829",
        "_score" : 10.722837,
        "_source" : {
          "context" : {
            "identity" : {
              "user_id" : "72562ad0-4f35-4624-8776-8b555dea851e",
              "type" : "login"
            }
          }
        }
      },
      {
        "_index" : "app_logs-2022.11.23-7",
        "_type" : "app",
        "_id" : "8f4e82a0-f333-4096-bfb6-767fed924093",
        "_score" : 10.722837,
        "_source" : {
          "context" : {
            "identity" : {
              "user_id" : "72562ad0-4f35-4624-8776-8b555dea851e",
              "type" : "login_error"
            }
          }
        }
      },
      {
        "_index" : "app_logs-2022.11.23-7",
        "_type" : "app",
        "_id" : "7090be5a-8b53-4723-a1ac-223476a000f1",
        "_score" : 10.722837,
        "_source" : {
          "context" : {
            "identity" : {
              "user_id" : "75bcb301-1cee-4b3b-aa1b-adbe4c011388",
              "type" : "login_error"
            }
          }
        }
      }
    ]
  }
}

但是我不知道如何获得同时有loginlogin_error事件的用户数量,我已经尝试了cardinality聚合、terms和其他几个,但它们都只是将类型拆分到显示总和的桶中,而不是按用户分组,我想找出有多少用户首先出现问题,但最后成功登录。
我所做的最好的工作是按user_id获取bucket,并按类型输出每个bucket的基数

"aggs": {
    "results": {
      "terms": {
        "field": "context.identity.user_id",
        "size": 300
      },
      "aggs": {
        "events": {
          "cardinality": {
            "field": "context.identity.type"
          }
        }
      }
    }
  }
ffvjumwh

ffvjumwh1#

我创建了一个基于句子I want to find how many users had problems first,but then managed to login in at end的示例。它的工作原理如下:
1.根据user_id生成聚集
1.按类型生成子聚集
1.忽略不包含login_error的文件

putMap

PUT test_stack_login
{
  "mappings": {
    "properties": {
      "context.identity.user_id": {
        "type": "keyword"
      },
      "context.identity.type": {
        "type": "keyword"
      }
    }
  }
}

放置示例文档

POST test_stack_login/_bulk?refresh&pretty
{"index":{}}
{"context.identity.user_id":1,"context.identity.type":"login_error"}
{"index":{}}
{"context.identity.user_id":1,"context.identity.type":"login"}
{"index":{}}
{"context.identity.user_id":2,"context.identity.type":"login"}
{"index":{}}
{"context.identity.user_id":3,"context.identity.type":"login"}
{"index":{}}
{"context.identity.user_id":4,"context.identity.type":"login_error"}
{"index":{}}
{"context.identity.user_id":4,"context.identity.type":"login"}

运行查询

GET test_stack_login/_search
{
  "size": 0,
  "aggs": {
    "NAME": {
      "terms": {
        "field": "context.identity.user_id",
        "size": 1000
      },
      "aggs": {
        "context_identity_type": {
          "terms": {
            "field": "context.identity.type",
            "size": 10
          }
        },
        "login_error_exist": {
          "bucket_selector": {
            "buckets_path": {
              "var1": "context_identity_type['login_error']>_count"
            },
            "script": "params.var1 != null"
          }
        }
      }
    }
  }
}

结果将像在ss


中一样

您将在context.identity.type字段中获得包含loginlogin_error信息的user_id。响应中的keys将为您提供至少登录一次失败和一次成功的user_id

"buckets" : [
  {"key" : "1" ...},
  {"key" : "4" ...}
]

相关问题