下面是我的数据文件,我想创建一个表,分隔符是什么?

jdzmm42g  于 2021-06-01  发布在  Hadoop
关注(0)|答案(3)|浏览(305)

我对配置单元中的数据使用以下查询。

CREATE EXTERNAL TABLE IF NOT EXISTS aircel1 (subscriberID INT, towerID STRING, dataDownloaded STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ''
STORED AS TEXTFILE
LOCATION '/user/username/name';

当数据如下时,分隔符是什么。

subId=00001111911128052627towerid=11232w34532543456345623453456984756894756bytes=122112212212212218.4621702216543667E17
    subId=00001111911128052639towerid=11232w34532543456345623453456984756894756bytes=122112212212212219.6726312167218586E17
    subId=00001111911128052615towerid=11232w34532543456345623453456984756894756bytes=122112212212212216.9431647633139046E17
    subId=00001111911128052615towerid=11232w34532543456345623453456984756894756bytes=122112212212212214.7836041833447418E17
    subId=00001111911128052639towerid=11232w34532543456345623453456984756894756bytes=122112212212212219.0366596827240525E17
    subId=00001111911128052619towerid=11232w34532543456345623453456984756894756bytes=122112212212212218.0686280014540467E17
    subId=00001111911128052658towerid=11232w34532543456345623453456984756894756bytes=122112212212212216.9860890496178944E17
    subId=00001111911128052652towerid=11232w34532543456345623453456984756894756bytes=122112212212212218.303981333116041E17
thigvfpy

thigvfpy1#

您可以尝试使用regex,它将遵循这些原则(我还没有测试过这个)

CREATE EXTERNAL TABLE IF NOT EXISTS aircel1 (
  subscriberID STRING, towerID STRING, dataDownloaded STRING
) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' 
WITH SERDEPROPERTIES ('input.regex'='subId=(.*)towerid=(.*)bytes=(.*)')
LOCATION '/user/username/dirname';

enyaitl3

enyaitl32#

我们可以使用下面的代码来完成任务,

create external table table1 (del string, subid string, towerid string, bytes double)

row format delimited
fields terminated by '='
location '/user/murali/';

create table table2 (subid string, towerid string, bytes double);

insert table table2 select 
    substring(subid,1,20),substring(towerid,1,41),bytes from table1;

select * from table2;
9gm1akwq

9gm1akwq3#

使用等号作为分隔符,可以分两步构建表。
首先,创建一个包含所有字符串列的临时表。
例如,第一列是字符串 00001111911128052627towerid .
然后用实际的数据类型创建“实表”,然后可以对字符串进行子串处理 "towerid" ,例如,从第一列开始

相关问题