这是我的配置单元查询,直接来自tpc ds工具箱:
WITH customer_total_return
AS (SELECT sr_customer_sk AS ctr_customer_sk,
sr_store_sk AS ctr_store_sk,
Sum(sr_fee) AS ctr_total_return
FROM store_returns,
date_dim
WHERE sr_returned_date_sk = d_date_sk
AND d_year = 2000
GROUP BY sr_customer_sk,
sr_store_sk)
SELECT TOP 100 c_customer_id
FROM customer_total_return ctr1,
store,
customer
WHERE ctr1.ctr_total_return > (SELECT Avg(ctr_total_return) * 1.2
FROM customer_total_return ctr2
WHERE ctr1.ctr_store_sk = ctr2.ctr_store_sk)
AND s_store_sk = ctr1.ctr_store_sk
AND s_state = 'TN'
AND ctr1.ctr_customer_sk = c_customer_sk
ORDER BY c_customer_id;
但是,在尝试运行它时出现以下错误:
失败:parseexception行11:11无法识别选择目标中“top”“100”“c\U customer\U id”附近的输入
我的理解是 TOP 100
在hiveql中语法无效。我怎样才能正确地重写这个?
2条答案
按热度按时间mkshixfv1#
这是多层次查询的一个坏例子。我建议:
笔记:
不要在句子中使用逗号
FROM
条款。始终使用适当的、明确的、标准的JOIN
语法。限定所有列引用,特别是当查询有多个表引用时。
不需要计算平均值的子查询。
hive 用途
LIMIT
,不是TOP
.3okqufwl2#
使用limit而不是top,如下所示: