hive查询输出到文件

oxosxuxt  于 2021-06-04  发布在  Hadoop
关注(0)|答案(11)|浏览(634)

我用java代码运行hive查询。例子:
“从id>100的表中选择*”
如何将结果导出到hdfs文件。

wn9m85ua

wn9m85ua1#

有两种方法可以存储hql查询结果:
保存到hdfs位置

INSERT OVERWRITE DIRECTORY "HDFS Path" ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
SELECT * FROM XXXX LIMIT 10;

保存到本地文件

$hive  -e "select * from table_Name" > ~/sample_output.txt
$hive -e "select * from table where city = 'London' and id >=100" > /home/user/outputdirectory/city details.csv
hc2pp10m

hc2pp10m2#

以下查询将直接将结果插入hdfs:

INSERT OVERWRITE DIRECTORY '/path/to/output/dir' SELECT * FROM table WHERE id > 100;
bf1o4zei

bf1o4zei3#

此命令将输出重定向到您选择的文本文件:

$hive -e "select * from table where id > 10" > ~/sample_output.txt
ibrsph3r

ibrsph3r4#

要设置输出目录和输出文件格式等,请尝试以下操作:

INSERT OVERWRITE [LOCAL] DIRECTORY directory1
[ROW FORMAT row_format] [STORED AS file_format] 
SELECT ... FROM ...

例子:

INSERT OVERWRITE DIRECTORY '/path/to/output/dir'
ROW FORMAT DELIMITED
STORED AS PARQUET
SELECT * FROM table WHERE id > 100;
2sbarzqh

2sbarzqh5#

我同意tnguyen80的回答。请注意,当查询中有特定的字符串值时,最好用双引号将整个查询括起来。
例如:

$hive -e "select * from table where city = 'London' and id >=100" > /home/user/outputdirectory/city details.csv
pinkon5k

pinkon5k6#

要在hdfs中直接保存文件,请使用以下命令:

hive> insert overwrite  directory '/user/cloudera/Sample' row format delimited fields terminated by '\t' stored as textfile select * from table where id >100;

这将把内容放在hdfs的/user/cloudera/sample文件夹中。

xbp102n0

xbp102n07#

理想的方法是使用“insert overwrite directory'/pathtofile'selectfrom temp where id>100”而不是“hive-e'selectfrom…'>/filepath.txt”

des4xlb0

des4xlb08#

在配置单元命令行界面中输入此行:
insert overwrite directory '/data/test' row format delimited fields terminated by '\t' stored as textfile select * from testViewQuery; testViewQuery -一些具体的观点

cgyqldqp

cgyqldqp9#

这将把结果放在一个目录下以制表符分隔的文件中:

INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/YourTableDir'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
SELECT * FROM table WHERE id > 100;
h43kikqp

h43kikqp10#

创建外部表
在表中插入数据
可选稍后删除表,因为它是一个外部表,所以不会删除该文件
例子:
正在创建外部表以将查询结果存储在“/user/myname/projecta\u additionadata/”

CREATE EXTERNAL TABLE additionaData
(
     ID INT,
     latitude STRING,
     longitude STRING
)
COMMENT 'Additional Data gathered by joining of the identified cities with latitude and longitude data' 
ROW FORMAT DELIMITED FIELDS
TERMINATED BY ',' STORED AS TEXTFILE location '/user/myName/projectA_additionaData/';

将查询结果输入temp表

insert into additionaData 
     Select T.ID, C.latitude, C.longitude 
     from TWITER  
     join CITY C on (T.location_name = C.location);

删除临时表

drop table additionaData
zu0ti5jz

zu0ti5jz11#

@sarath如果我想从不同的表运行另一个select*命令并写入同一个文件,如何覆盖该文件?
插入覆盖本地目录'/home/training/mydata/outputs'选择expl,计数(expl)为总计
from(select explode(splits)as expl from(select split(words,'')as splits from wordcount)t2)t3按expl分组;
这是萨拉问题的一个例子
上面是存储在本地目录中的输出文件中的字数计数作业:)

相关问题