使用hcatalog json serde时出现“无法从空字符串创建路径”错误

h79rfbju  于 2021-06-03  发布在  Hadoop
关注(0)|答案(1)|浏览(251)

我正在尝试使用hcatalog json serde(来自hcatalog-core-0.5.0-cdh4.7.0.jar)的配置单元表。我运行的是cdh4(hadoop2.0.0-cdh4.7.0和hive0.10.0-cdh4.7.0)。
表定义:

CREATE EXTERNAL TABLE some_table(
  user_id int COMMENT 'from deserializer',
  event_time int COMMENT 'from deserializer',
  some_string string COMMENT 'from deserializer',
  some_id string COMMENT 'from deserializer',
  another_id int COMMENT 'from deserializer')
PARTITIONED BY (
  year int,
  month int,
  day int)
ROW FORMAT SERDE
  'org.apache.hcatalog.data.JsonSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://localhost:8020/somedir/some_table'
TBLPROPERTIES (
  'last_modified_by'='volker',
  'last_modified_time'='1424980336',
  'transient_lastDdlTime'='1424980952')

创建的分区如下:

alter table some_table add if not exists partition (year=2015,month=02,day=26) location '/somedir/some_table/year=2015/month=02/day=26'

第一次很顺利,我可以在选择所有列时读取数据:

hive> select * from some_table limit 10;
OK
671764813   1424980760  fbx NtiwgY  6   2015    02  26
1632511524  1424980760  fbx AdMybO  10  2015    02  26
1201817175  1424980760  fbx GgQJEd  6   2015    02  26
1621940110  1424980760  fbx qmsXNQ  12  2015    02  26
326380277   1424980760  fbx zgVFgP  2   2015    02  26
1256744282  1424980760  fbx GeIFxq  6   2015    02  26
1741961976  1424980760  fbx CiuxZU  8   2015    02  26
2009923690  1424980760  fbx ZmGOvK  2   2015    02  26
1728798342  1424980760  fbx YikDcV  8   2015    02  26
688185292   1424980760  fbx NssSWN  7   2015    02  26

但是,当我尝试在查询失败的任何位置读取或引用特定字段时:

hive> select another_id from some_table limit 10;
java.lang.IllegalArgumentException: Can not create a Path from an empty string
    at org.apache.hadoop.fs.Path.checkPathArg(Path.java:91)
    at org.apache.hadoop.fs.Path.<init>(Path.java:99)
    at org.apache.hadoop.fs.Path.<init>(Path.java:58)
    at org.apache.hadoop.mapred.JobClient.copyRemoteFiles(JobClient.java:745)
    at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:849)
    at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:774)
    at org.apache.hadoop.mapred.JobClient.access$400(JobClient.java:178)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:991)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:976)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:976)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:950)
    at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:448)
    at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:138)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:66)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1383)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1169)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:982)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

在where条件中使用字段时也会发生同样的情况。
我可以在where子句中使用分区字段,所以 select * from some_table where year=2015 很好,但是 select year from some_table limit 10 失败,出现上述错误。
hdfs中的文件如下所示:

{"another_id":6,"user_id":671764813,"some_id":"NtiwgY","event_time":1424980760,"some_string":"fbx"}
{"another_id":10,"user_id":1632511524,"some_id":"AdMybO","event_time":1424980760,"some_string":"fbx"}
{"another_id":6,"user_id":1201817175,"some_id":"GgQJEd","event_time":1424980760,"some_string":"fbx"}

我希望这只是我的表定义的问题。欢迎任何帮助。

gzjq41n4

gzjq41n41#

我并没有让它与hcatalog serde一起工作,但是,我想要的是将json存储在hdfs中,并将其作为一个配置单元表来读取,我最终成功地使用了另一个serde,您可以在这里找到:
https://github.com/rcongiu/hive-json-serde
对我来说cdh4非常好用。

相关问题