给定一个配置单元表,如下所示:
> desc T;
dim1 string
dim2 string
dim3 string
value1 int
value2 int
我试着按组随机抽取1000行 (dim1, dim2, dim3)
.
一种方法是:
# bash
for dim1 in dim1_1, dim1_2; do
for dim2 in dim2_1, dim2_2; do
for dim3 in dim3_1, dim3_2; do
hive -e "select * from T where dim1=$dim1 and dim2=$dim2 and dim3=$dim3 limit 1000;"
done done done
然后将连续执行2^3=8个查询。有没有更有效的方法?
1条答案
按热度按时间hmtdttj41#