我已经将flink hadoop兼容性添加到从hdfs路径读取序列文件的项目中,
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-hadoop-compatibility_2.11</artifactId>
<version>1.5.6</version>
</dependency>
下面是java代码片段,
DataSource<Tuple2<NullWritable, BytesWritable>> input = env.createInput(HadoopInputs.readHadoopFile(
new org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat<NullWritable, BytesWritable>(),
NullWritable.class, BytesWritable.class, path));
当我在eclipse中运行它时,它工作得很好,但是当我通过命令行“flink run…”提交它时,它会抱怨,
The type returned by the input format could not be automatically determined. Please specify the TypeInformation of the produced type explicitly by using the 'createInput(InputFormat, TypeInformation)' method instead.
好的,我更新了代码来添加类型信息,
DataSource<Tuple2<NullWritable, BytesWritable>> input = env.createInput(HadoopInputs.readHadoopFile(
new org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat<NullWritable, BytesWritable>(),
NullWritable.class, BytesWritable.class, path),
TypeInformation.of(new TypeHint<Tuple2<NullWritable, BytesWritable>>() {}));
现在它抱怨说,
Caused by: java.lang.RuntimeException: Could not load the TypeInformation for the class 'org.apache.hadoop.io.Writable'. You may be missing the 'flink-hadoop-compatibility' dependency.
有人建议将flink-hadoop-compatibility\u 2.11-1.5.6.jar复制到flink\u home/lib,但没有帮助,还是同样的错误。
有人知道吗?
我的flink是一个独立安装,版本1.5.6。
更新:
抱歉,我把flink-hadoop-compatibility\u2.11-1.5.6.jar复制到了错误的地方,修复之后,它就可以工作了。
现在我的问题是,还有别的路要走吗?因为将jar文件复制到flink\u home/lib对我来说绝对不是一个好主意,尤其是在谈论大型flink集群时。
1条答案
按热度按时间ulmd4ohb1#
已在版本1.9.0中修复,请参阅https://issues.apache.org/jira/browse/flink-12163 有关详细信息