我的hql是:
select day,app_id,platform,count(1) as dau from (select ad.day,ads.app_id,ads.platform,ad.deviceid from mobile_ad_space ads inner join (select day,deviceid,flightid from mobile_day_adlog where day='$STAT_DAY') ad on cast(ads.space_id as string)=ad.flightid group by ad.day,ads.app_id,ads.platform,ad.deviceid) day_active group by day,app_id,platform having dau>5;
但结果包含重复数据:
| day | appid | platform | dau|
| 2016-06-29 | 1 | ios | 70533 |
| 2016-06-29 | 1 | android | 49307 |
| 2016-06-29 | 1 | android | 49307 |
| 2016-06-29 | 1 | android | 49307 |
我也尝试过其他hql:
SELECT day_active.day ,day_active.app_id ,day_active.platform ,count(1) AS dau FROM ( SELECT day_device.day AS day ,day_device.app_id AS app_id ,day_device.platform AS platform ,day_device.deviceid AS deviceid FROM (SELECT ad.day AS day ,ads.app_id AS app_id ,ads.platform AS platform ,ad.deviceid AS deviceid FROM mobile_ad_space ads INNER JOIN ( SELECT day ,deviceid ,flightid FROM mobile_day_adlog WHERE day = '$STAT_DAY' ) ad ON cast(ads.space_id AS string) = ad.flightid) day_device GROUP BY day_device.day ,day_device.app_id ,day_device.platform ,day_device.deviceid ) day_active GROUP BY day_active.day ,day_active.app_id ,day_active.platform HAVING dau>5 ORDER BY day_active.day ,day_active.app_id ,day_active.platform;
但它仍然有这个问题
有人能帮我吗?
2条答案
按热度按时间juud5qan1#
最后,我找到了解决办法。
在我的hql中,我使用
insert overwrite directory '$HQL_OUT_PATH'
覆盖输出。但是它看起来不稳定,导致重复项。因此,我在hql之前清理了输出路径,结果现在就出来了~!mlnl4t2r2#
尝试检查值是否包含空格,请使用group by中的trim()。还要按列检查每组有多少不同的值:
select distinct platform ...
等等