复杂配置单元查询

cdmah0mi 于 2021-06-03 发布在 Hadoop

关注(0)|答案(1)|浏览(398)

嗨，我有下表：

ID------ |--- time 
======================
5-------  | ----200101
3--------| ---  200102  
2--------|----  200103  
12 ------|----  200101 
16-------|----  200103  
18-------|----  200106

现在我想知道一年中某个月出现的频率。我不能使用分组依据，因为这只计算表中出现的次数。但是当一年中某个月没有出现时，我也希望得到一个0。所以输出应该是这样的：

time-------|----count
=====================
200101--|--      2

200102--|--      1

200103--|--      1

200104--|--      0

200105--|--      0

200106--|--      1

对不起，表格格式不好，我希望它仍然清楚我的意思。我很感激任何帮助

hadoop Hive

来源：https://stackoverflow.com/questions/17452795/complex-hive-query

1条答案

按热度按时间

wfveoks01#

您可以提供包含所有年和月信息的年-月表。我为您编写了一个脚本来生成这样的csv文件：


# !/bin/bash

# year_month.sh

start_year=1970
end_year=2015

for year in $( seq ${start_year} ${end_year} ); do
    for month in $( seq 1 12 ); do
        echo ${year}$( echo ${month} | awk '{printf("%02d\n", $1)}');
    done;
done > year_month.csv

保存在 year_month.sh 然后运行它。然后你会得到一个文件 year_month.csv 包含从1970年到2015年的年份和月份。你可以改变 start_year 以及 end_year 指定年份范围。
然后，上传 year_month.csv 文件到hdfs。例如，

hadoop fs -mkdir /user/joe/year_month
hadoop fs -put year_month.csv /user/joe/year_month/

之后，您可以加载 year_month.csv 进入Hive。例如，

create external table if not exists 
year_month (time int) 
location '/user/joe/year_month';

最后，您可以将新表与您的表连接起来以获得最终结果。例如，假设您的表是 id_time :

from (select year_month.time as time, time_count.id as id 
      from year_month 
      left outer join id_time 
      on year_month.time = id_time.time) temp
select time, count(id) as count 
group by time;

注意：您需要对上述语句进行微小的修改（如path、type）。

赞(0）回复(0）举报 2021-06-04

我来回答

复杂配置单元查询

1条答案

相关问题

热门标签

最新问答