hive子查询-not-in子句

qf9go6mv  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(406)

我正在尝试对配置单元运行以下查询:

SELECT COUNT(*)
FROM mydata
WHERE store NOT IN (SELECT store_out
                    FROM ( SELECT a.store as store_out, COUNT(*) AS CNT
                             FROM mydata a
                             GROUP BY store) TB1
                    WHERE CNT > AVG(CNT) + STDDEV(CNT) AND  CNT < AVG(CNT) - STDDEV(CNT))

但是我得到了以下错误:

Error while compiling statement: FAILED: SemanticException [Error 10249]: Line 3:6 Unsupported SubQuery Expression 'store': Correlating expression cannot contain unqualified column references.

如何以另一种方式编写此查询?
谢谢!

zujrkrfu

zujrkrfu1#

我没有你确切的数据,所以很难验证这一点,但我会做一些类似的事情

SELECT COUNT(*)
FROM (
  SELECT a.*
    , flg
  FROM mydata a
  LEFT OUTER JOIN (
    SELECT store_out, flg
    FROM (
      SELECT store_out
        , cnt
        , 1 AS flg
        , AVG(cnt)         OVER () AS avg_cnt
        , STDDEV_SAMP(cnt) OVER () AS std_cnt
      FROM (
        SELECT store AS store_out
          , COUNT(*) AS cnt
        FROM mydata
        GROUP BY store ) x
      ) y
    WHERE cnt > avt_cnt + std_cnt AND cnt < avg_cnt - std_cnt ) z
  ON a.store = z.store_out ) final
WHERE flg IS NULL

基本上,左键联接子查询并创建一个伪列。该列在主表中不存在,因此在所有flg值都为null的情况下,这些是您想要的存储。希望这有帮助。

相关问题