我在awsec2上部署了hadoop2.4,使用s3本机文件系统替换hdfs。我尝试了几个示例应用程序,都给了我下面的堆栈跟踪msg(7月24日的一个旧线程挂在那里w/o正在解决。。。所以我在这里附上调试信息……:
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar wordcount s3n://mybkt/wc/ s3n://mybkt/out
14/08/12 21:57:35 DEBUG util.Shell: setsid exited with exit code 0
14/08/12 21:57:36 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[Rate of successful kerberos logins and latency (milliseconds)], about=, type=DEFAULT, always=false, sampleName=Ops)
14/08/12 21:57:36 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[Rate of failed kerberos logins and latency (milliseconds)], about=, type=DEFAULT, always=false, sampleName=Ops)
14/08/12 21:57:36 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[GetGroups], about=, type=DEFAULT, always=false, sampleName=Ops)
14/08/12 21:57:36 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and group related metrics
14/08/12 21:57:36 DEBUG util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
14/08/12 21:57:36 DEBUG security.Groups: Creating new Groups object
14/08/12 21:57:36 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
14/08/12 21:57:36 DEBUG util.NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path
14/08/12 21:57:36 DEBUG util.NativeCodeLoader: java.library.path=/home/ubuntu/hadoop-2.4.0/lib
14/08/12 21:57:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/08/12 21:57:36 DEBUG security.JniBasedUnixGroupsMappingWithFallback: Falling back to shell based
14/08/12 21:57:36 DEBUG security.JniBasedUnixGroupsMappingWithFallback: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping
14/08/12 21:57:36 DEBUG security.Groups: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
14/08/12 21:57:36 DEBUG security.UserGroupInformation: hadoop login
14/08/12 21:57:36 DEBUG security.UserGroupInformation: hadoop login commit
14/08/12 21:57:36 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: ubuntu
14/08/12 21:57:36 DEBUG security.UserGroupInformation: UGI loginUser:ubuntu (auth:SIMPLE)
14/08/12 21:57:36 DEBUG service.Jets3tProperties: s3service.https-only=true
14/08/12 21:57:36 DEBUG service.Jets3tProperties: storage-service.internal-error-retry-max=5
14/08/12 21:57:36 DEBUG service.Jets3tProperties: http.connection-manager.factory-class-name=org.jets3t.service.utils.RestUtils$ConnManagerFactory
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.connection-timeout-ms=60000
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.socket-timeout-ms=60000
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.stale-checking-enabled=true
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.useragent=null
14/08/12 21:57:36 DEBUG utils.RestUtils: Setting user agent string: JetS3t/0.9.0 (Linux/3.13.0-29-generic; amd64; en; JVM 1.7.0_55)
14/08/12 21:57:36 DEBUG service.Jets3tProperties: http.protocol.expect-continue=true
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.connection-manager-timeout=0
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.retry-max=5
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.proxy-autodetect=true
14/08/12 21:57:36 DEBUG service.Jets3tProperties: s3service.s3-endpoint=s3.amazonaws.com
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: About to attempt auto proxy detection under Java version:1.7.0_55-b14
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: Sun Plugin reported java version not 1.3.X, 1.4.X, 1.5.X or 1.6.X - trying failover detection...
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: Using failover proxy detection...
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: Plugin Proxy Config List Property:null
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: No configured plugin proxy list
14/08/12 21:57:36 DEBUG service.Jets3tProperties: s3service.default-storage-class=null
14/08/12 21:57:36 DEBUG service.Jets3tProperties: s3service.server-side-encryption=null
14/08/12 21:57:36 DEBUG service.Jets3tProperties: http.connection-manager.factory-class-name=org.jets3t.service.utils.RestUtils$ConnManagerFactory
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.connection-timeout-ms=60000
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.socket-timeout-ms=60000
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.stale-checking-enabled=true
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.useragent=null
14/08/12 21:57:36 DEBUG utils.RestUtils: Setting user agent string: JetS3t/0.9.0 (Linux/3.13.0-29-generic; amd64; en; JVM 1.7.0_55)
14/08/12 21:57:36 DEBUG service.Jets3tProperties: http.protocol.expect-continue=true
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.connection-manager-timeout=0
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.retry-max=5
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.proxy-autodetect=true
14/08/12 21:57:36 DEBUG service.Jets3tProperties: s3service.s3-endpoint=s3.amazonaws.com
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: About to attempt auto proxy detection under Java version:1.7.0_55-b14
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: Sun Plugin reported java version not 1.3.X, 1.4.X, 1.5.X or 1.6.X - trying failover detection...
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: Using failover proxy detection...
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: Plugin Proxy Config List Property:null
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: No configured plugin proxy list
14/08/12 21:57:36 DEBUG service.Jets3tProperties: devpay.user-token=null
14/08/12 21:57:36 DEBUG service.Jets3tProperties: devpay.product-token=null
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.requester-pays-buckets-enabled=false
14/08/12 21:57:36 DEBUG security.UserGroupInformation: PrivilegedAction as:ubuntu (auth:SIMPLE) from:org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
14/08/12 21:57:36 DEBUG mapreduce.Cluster: Trying ClientProtocolProvider : org.apache.hadoop.mapred.YarnClientProtocolProvider
14/08/12 21:57:36 DEBUG service.AbstractService: Service: org.apache.hadoop.mapred.ResourceMgrDelegate entered state INITED
14/08/12 21:57:36 DEBUG service.AbstractService: Service: org.apache.hadoop.yarn.client.api.impl.YarnClientImpl entered state INITED
14/08/12 21:57:37 INFO client.RMProxy: Connecting to ResourceManager at /172.31.20.187:8032
14/08/12 21:57:37 DEBUG security.UserGroupInformation: PrivilegedAction as:ubuntu (auth:SIMPLE) from:org.apache.hadoop.yarn.client.RMProxy.getProxy(RMProxy.java:130)
14/08/12 21:57:37 DEBUG ipc.YarnRPC: Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
14/08/12 21:57:37 DEBUG ipc.HadoopYarnProtoRPC: Creating a HadoopYarnProtoRpc proxy for protocol interface org.apache.hadoop.yarn.api.ApplicationClientProtocol
14/08/12 21:57:37 DEBUG ipc.Server: rpcKind=RPC_PROTOCOL_BUFFER, rpcRequestWrapperClass=class org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper, rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@7d66036e
14/08/12 21:57:37 DEBUG ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@71cebfd2
14/08/12 21:57:37 DEBUG service.AbstractService: Service org.apache.hadoop.yarn.client.api.impl.YarnClientImpl is started
14/08/12 21:57:37 DEBUG service.AbstractService: Service org.apache.hadoop.mapred.ResourceMgrDelegate is started
14/08/12 21:57:37 DEBUG security.UserGroupInformation: PrivilegedAction as:ubuntu (auth:SIMPLE) from:org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:330)
14/08/12 21:57:37 DEBUG security.UserGroupInformation: PrivilegedActionException as:ubuntu (auth:SIMPLE) cause:org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3n
14/08/12 21:57:37 INFO mapreduce.Cluster: Failed to use org.apache.hadoop.mapred.YarnClientProtocolProvider due to error: Error in instantiating YarnClient
14/08/12 21:57:37 DEBUG mapreduce.Cluster: Trying ClientProtocolProvider : org.apache.hadoop.mapred.LocalClientProtocolProvider
14/08/12 21:57:37 DEBUG mapreduce.Cluster: Cannot pick org.apache.hadoop.mapred.LocalClientProtocolProvider as the ClientProtocolProvider - returned null protocol
14/08/12 21:57:37 DEBUG security.UserGroupInformation: PrivilegedActionException as:ubuntu (auth:SIMPLE) cause:java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:84)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
以下是我的配置文件:
yarn-site.xml:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>172.31.20.187:8032</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>172.31.20.187:8031</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>172.31.20.187:8030</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/home/ubuntu/hdfs/tmp</value>
</property>
mapred-site.xml:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>640</value>
<description>Larger resource limit for maps.</description>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx768m</value>
<description>Heap-size for child jvms of maps.</description>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>640</value>
<description>Larger resource limit for reduces.</description>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx768m</value>
<description>Heap-size for child jvms of reduces.</description>
</property>
<property>
<name>mapreduce.jobtracker.address</name>
<value>172.31.20.187:8021</value>
</property>
我还通过以下链接配置了AWSS3的访问控制(core site.xml):https://wiki.apache.org/hadoop/amazons3
core-site.xml:
<property>
<name>fs.defaultFS</name>
<value>s3n://mybkt</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>123</value>
</property>
<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value>456</value>
</property>
我也尝试过hadoopv1,结果发现hadoop1在s3n文件系统上也能正常工作。但它似乎不适用于hadoopv2。
请帮忙。提前谢谢。
1条答案
按热度按时间yvfmudvl1#
想想看,使用s3或任何其他文件系统实现来替代hdfs/namenode是不可能这么简单的。在超光速粒子文件系统上尝试了同样的方法,但失败了,看到了吗https://groups.google.com/forum/#!主题/超光速粒子用户/u4oobekgiga
想想其他人是怎么做到的:
添加抽象文件系统实现
修补hadoop以不检查暂存工件的权限
总之,您可以使用s3来读取作业的输入和写入作业的输出,但是不支持使用它作为hdfs替换执行元数据层!