如何使用apache beam从hive读取数据?

bzzcjhmw  于 2021-06-26  发布在  Hive
关注(0)|答案(2)|浏览(520)

如何使用apachebeam读取hive/如何在apachebeam中将hive用作源?

szqfcxe2

szqfcxe21#

hadoopinputformatio可用于从配置单元读取,如下所示:

Configuration conf = new Configuration();
conf.setClass("mapreduce.job.inputformat.class", HCatInputFormat.class, 
InputFormat.class);
conf.setClass("key.class", LongWritable.class, WritableComparable.class);
conf.setClass("value.class", DefaultHCatRecord.class, Writable.class);
conf.set("hive.metastore.uris", "...");
HCatInputFormat.setInput(hiveConf, "myDatabase", "myTable", "myFilter");

PCollection<KV<LongWritable, DefaultHCatRecord>> data =
p.apply(HadoopInputFormatIO.<Long, 
DefaultHCatRecord>read().withConfiguration(conf));
bd1hkmkf

bd1hkmkf2#

2017年7月合并的拉取请求允许 Beam 2.1.0 支持 hive 通过 HCatalog https://issues.apache.org/jira/browse/beam-2357 .

相关问题