如何为hadoop流定义或解决此错误?

r55awzrz  于 2021-05-29  发布在  Hadoop
关注(0)|答案(2)|浏览(551)

我在hadoop mr job中遇到了一些错误,如何为hadoop流定义这个问题?

Error: java.io.EOFException: Unexpected end of input stream
    at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:145)
    at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
    at java.io.InputStream.read(InputStream.java:101)
    at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
    at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
    at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
    at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:209)
    at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)
    at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
    at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:63)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)

Error: java.io.EOFException: Unexpected end of input stream
    at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:145)
    at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
    at java.io.InputStream.read(InputStream.java:101)
    at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
    at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
    at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
    at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:209)
    at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)
    at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
    at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:63)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)

(很遗憾,我没有发布任何源代码的权限)

nukf8bse

nukf8bse1#

这些错误日志帮助不大。由于您没有共享代码的权限,因此可以尝试以下步骤。
检查代码中使用的依赖库是否存在于hadoop集群的所有节点中。这是必需的,因为任务可以在任何工作节点中执行。
获取一个示例输入文件并在本地执行代码,然后将其作为mapreduce程序运行。您可以按以下方式在本地执行它。
cat sampleinput | pythonMap代码.py

cmssoen2

cmssoen22#

我希望你能解决这个问题。但是可能有一些空文件构成了您正在查询的表或目录/批读取的一部分。
运行一个检查来防止出现大小为0的文件会很好。

相关问题