配置单元连接选择非常慢

h6my8fg2  于 2021-05-27  发布在  Hadoop
关注(0)|答案(1)|浏览(478)

嗨,我有两个表:用户信息,ip地址,一个是50000,另一个是100000。现在需要使用user表的ip来检查属性,将ip转换为int并将interval与ip\u位置进行比较
我的配置单元版本是3.0.0,这个版本没有索引
ip地址:在此处输入图像描述
在pg中此操作非常快:

set search_path=res;
select * from(
select ip,
(split_part(ip,'.',1)::bigint*256*256*256
+split_part(ip,'.',2)::bigint*256*256
+split_part(ip,'.',3)::bigint*256
+split_part(ip,'.',4)::bigint)::int8 as ipvalue
from user_info) t1
left join ip_location t2 on 
ipv4_val_begin=(select max(ipv4_val_begin) from ip_location where ipv4_val_begin <= ipvalue);

但我在hive上找不到这种语法的替代方法:

select ip,
t2.location_country
cast(split(ip,"\\.")[0] as bigint)*256*256*256
+cast(split(ip,"\\.")[0] as bigint)*256*256
+cast(split(ip,"\\.")[0] as bigint)*256
+cast(split(ip,"\\.")[0] as bigint) as ipvalue
from source.v_dm_vip_user t1
left join res.ip_location t2 on 
ipv4_val_begin=(select max(ipv4_val_begin) from res.ip_location where ipv4_val_begin <= ipvalue);

错误:在此处输入图像描述
换成下面的sql,可以查询成功,但是速度很慢,需要1天:

select ip,
t2.location_country
cast(split(ip,"\\.")[0] as bigint)*256*256*256
+cast(split(ip,"\\.")[0] as bigint)*256*256
+cast(split(ip,"\\.")[0] as bigint)*256
+cast(split(ip,"\\.")[0] as bigint) as ipvalue
from source.v_dm_vip_user t1
left join res.ip_location t2 on 
cast(split(ip,"\\.")[0] as bigint)*256*256*256
+cast(split(ip,"\\.")[0] as bigint)*256*256
+cast(split(ip,"\\.")[0] as bigint)*256
+cast(split(ip,"\\.")[0] as bigint) > ipv4_val_begin
and 
cast(split(ip,"\\.")[0] as bigint)*256*256*256
+cast(split(ip,"\\.")[0] as bigint)*256*256
+cast(split(ip,"\\.")[0] as bigint)*256
+cast(split(ip,"\\.")[0] as bigint) < ipv4_val_end;

有更好更快的sql吗?我试了很多,但没用,谢谢。

mkh04yzy

mkh04yzy1#

我尝试了视图和行组索引,但它不能加快很多。我想问一下如何使用Hive加速ip地址范围这样,和Hive的速度也慢Spark。

相关问题