如何在ElasticSearch(python)中基于两个子聚合指标的比较来过滤bucket?

kcwpcxri  于 2023-03-29  发布在  ElasticSearch
关注(0)|答案(1)|浏览(104)

我的索引包含具有以下字段的文档:user_id,user_name,post_text,post_sentiment其中post_sentiment是双精度类型,并且表示帖子的情感。大于0的post_sentiment指示它是快乐帖子,而小于0的post_sentiment指示悲伤帖子。
我正在尝试检索快乐帖子多于悲伤帖子的用户。我使用ElasticSearch高级Python库。
我创建了以下函数,在逻辑上看起来是正确的。但是,运行它会产生错误消息:TransportError(500,'search_phase_execution_exception').我已经确定问题不在于连接或索引,而实际上是查询结构。请指出我可能在这里做错了什么。

def users_more_sentiment_posts(search_object: Search):
    a = search_object.aggs.bucket(
            "users",
            "terms",
            field="user_id"
        ).metric(
            "positive_post_count_per_bucket",
            "range", 
            field="post_sentiment", 
            ranges= [{'from': 0.0}]
        ).metric(
            "negative_post_count_per_bucket",
            "range", 
            field="post_sentiment", 
            ranges= [{'to': 0.0}]
        ).pipeline(
            "happy_posts",
            "bucket_selector",
            buckets_path={
                "positiveCount": "positive_post_count_per_bucket._count",
                "negativeCount": "negative_post_count_per_bucket._count"
            },
            script="params.positiveCount > params.negativeCount"
        ).bucket(
            "posts",
            "top_hits",
            size=10
        )

    response = search_object.execute()
pqwbnv8z

pqwbnv8z1#

范围聚合返回多个存储桶,并且您需要每个存储桶中的文档计数。我已将两个范围合并到单个聚合中,并添加了关键字以访问正和负范围存储桶。

def users_more_sentiment_posts(search_object: Search):
    a = search_object.aggs.bucket(
            "users",
            "terms",
            field="user_id"
        ).metric(
            "post_count_per_bucket",
            "range", 
            field="post_sentiment", 
            ranges=[{"from":0,"key":"positive"},{"to":0,"key":"negative"}]
        ).pipeline(
            "happy_posts",
            "bucket_selector",
            buckets_path={
                "positiveCount": "post_count_per_bucket['positive']._count",
                "negativeCount": "post_count_per_bucket['negative']._count"
            },
            script="params.positiveCount > params.negativeCount"
        ).bucket(
            "posts",
            "top_hits",
            size=10
        )
    response = search_object.execute()

相关问题