尝试运行emr流作业经常失败,原因是:
2014-10-15 18:36:36,560 ERROR [main] org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[main,5,main] threw an Exception.
java.io.IOException: Exception reading /mnt/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1413396780703_0003/container_1413396780703_0003_01_000218/container_tokens
at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:177)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:744)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:703)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:605)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:98)
Caused by: java.io.FileNotFoundException: /mnt/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1413396780703_0003/container_1413396780703_0003_01_000218/container_tokens (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:172)
... 4 more
故障是不确定的,但在大型集群上过于频繁。我就是这样启动集群的:
elastic-mapreduce --create --alive --instance-group master --instance-type m1.large \
--instance-count 1 \
--instance-group core --instance-type r3.xlarge \
--instance-count 200 --hadoop-version "2.4.0" \
--ami-version "3.2.1" --enable-debugging --json ./emr_config \
--bootstrap-action 's3://path/to/bootstrap.sh' --bootstrap-name Bootstrap
这是步骤配置(emr\u config):
[
{
"Name": "Step Name",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "/home/hadoop/contrib/streaming/hadoop-streaming.jar",
"Args": [
"-files", "s3://path/to/mapper.py",
"-input", "s3://path/to/input/",
"-output", "s3://path/to/output/",
"-mapper", "mapper.py",
"-reducer", "/bin/cat",
"-jobconf", "mapreduce.map.java.opts=-Xmx22528m",
"-jobconf", "mapreduce.map.memory.mb=23424",
"-jobconf", "mapreduce.task.timeout=24000000",
"-jobconf", "mapreduce.job.maps=200",
"-jobconf", "mapreduce.tasktracker.map.tasks.maximum=1",
"-jobconf", "mapred.map.tasks.speculative.execution=false"
]
}
}
]
有人知道问题的根源或解决方法吗?
暂无答案!
目前还没有任何答案,快来回答吧!