我在amazonemr上的mapreduce作业失败了,因为如果第一次尝试未能将结果复制到s3,那么将创建文件(可能是部分文件),随后的reduce尝试将拒绝对已经存在的文件进行写入。
第一次尝试日志:
014-11-30 06:56:19,774 INFO [main] com.amazonaws.latency: StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: null; Request ID: removed), S3 Extended Request ID: removed=], ServiceName=[Amazon S3], AWSErrorCode=[null], AWSRequestID=[removed], ServiceEndpoint=[https://devel.rui.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=0, ClientExecuteTime=[130.087], HttpRequestTime=[118.72], HttpClientReceiveResponseTime=[32.585], RequestSigningTime=[0.646], HttpClientSendRequestTime=[0.835],
2014-11-30 06:56:19,803 INFO [main] com.amazonaws.latency: StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: null; Request ID: removed), S3 Extended Request ID: 1removed=], ServiceName=[Amazon S3], AWSErrorCode=[null], AWSRequestID=[removed], ServiceEndpoint=[https://removed.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[27.899], HttpRequestTime=[26.898], HttpClientReceiveResponseTime=[9.405], RequestSigningTime=[0.559], HttpClientSendRequestTime=[1.016],
2014-11-30 06:56:19,939 INFO [main] com.amazonaws.latency: StatusCode=[200], ServiceName=[Amazon S3], AWSRequestID=[removed], ServiceEndpoint=[https://removedi.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[127.219], HttpRequestTime=[20.791], HttpClientReceiveResponseTime=[15.467], RequestSigningTime=[0.391], ResponseProcessingTime=[82.617], HttpClientSendRequestTime=[0.955],
2014-11-30 06:56:19,999 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
重试日志(所有日志看起来相同):
RequestSigningTime=[0.663], ResponseProcessingTime=[12.466], HttpClientSendRequestTime=[0.832],
2014-11-30 07:23:56,526 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child :
java.io.ioexception:文件已存在exists:s3n://删除/删除/part-r-00005.gz
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.create(S3NativeFileSystem.java:615)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:910)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:891)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:788)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.create(EmrFileSystem.java:169)
at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:135)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:548)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:622)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
有趣的是,如果我打开partfiile0005.gz,它里面有东西,是应该的格式。
任何想法,如何解决这个问题(以及如何做到):a)增加处理延迟(例如,增加超时时间)b)重试删除现有文件(如果已经存在)。
1条答案
按热度按时间lzfw57am1#
您可以修改作业,将输出写入临时目录,该目录以jobid或时间戳命名,以确保唯一性,然后在处理完成后,将内容移动到所需的输出位置。这样,如果在写入部分输出后处理时出错,则所需的输出目录不会受到影响。这也意味着您不会意外地读取失败作业的部分输出。