在hdp(2.2)plantform上使用pyspark on yarn client(2.6.0)将hbase(0.98.4.2.2.0.0)表读取到spark(1.2.0.2.2.0.0-82)rdd时,我遇到了一个奇怪的异常:
2015-04-14 19:05:11,295 WARN [task-result-getter-0] scheduler.TaskSetManager (Logging.scala:logWarning(71)) - Lost task 0.0 in stage 0.0 (TID 0, hadoop-node05.mathartsys.com): java.lang.IllegalStateException: unread block data
at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
我遵循了spark示例python代码:(https://github.com/apache/spark/blob/master/examples/src/main/python/hbase_inputformat.py)我的代码是:
import sys
from pyspark import SparkContext
if __name__ == "__main__":
sc = SparkContext(appName="HBaseInputFormat")
conf = {"hbase.zookeeper.quorum": "hadoop-node01.mathartsys.com,hadoop-node02.mathartsys.com,hadoop-node03.mathartsys.com",
"hbase.mapreduce.inputtable": "test",
"hbase.cluster.distributed":"true",
"hbase.rootdir":"hdfs://hadoop-node01.mathartsys.com:8020/apps/hbase/data",
"hbase.zookeeper.property.clientPort":"2181",
"zookeeper.session.timeout":"30000",
"zookeeper.znode.parent":"/hbase-unsecure"}
keyConv = "org.apache.spark.examples.pythonconverters.ImmutableBytesWritableToStringConverter"
valueConv = "org.apache.spark.examples.pythonconverters.HBaseResultToStringConverter"
hbase_rdd = sc.newAPIHadoopRDD(
"org.apache.hadoop.hbase.mapreduce.TableInputFormat",
"org.apache.hadoop.hbase.io.ImmutableBytesWritable",
"org.apache.hadoop.hbase.client.Result",
keyConverter=keyConv,
valueConverter=valueConv,
conf=conf)
output = hbase_rdd.collect()
for (k, v) in output:
print (k, v)
sc.stop()
提交的工作如下:
spark-submit --master yarn-client --driver-class-path /opt/spark/spark-1.2.0.2.2.0.0-82-bin-2.6.0.2.2.0.0-2041/lib/*:/usr/hdp/current/hbase-client/lib/*:/usr/hdp/current/hadoop-mapreduce-client/* hbase_inputformat.py
我的环境是:
centos 6.5版
高密度聚乙烯2.2
Spark1.2.0.2.2.0.0-82-bin-2.6.0.2.2.0.0-2041
能给点建议解决吗?!
完整日志为:
[root@hadoop-node03 hbase]# spark-submit --master yarn-client --driver-class-path /opt/spark/spark-1.2.0.2.2.0.0-82-bin-2.6.0.2.2.0.0-2041/lib/*:/usr/hdp/current/hbase-client/lib/*:/usr/hdp/current/hadoop-mapreduce-client/* hbase_test2.py
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/spark/spark-1.2.0.2.2.0.0-82-bin-2.6.0.2.2.0.0-2041/lib/spark-examples-1.2.0.2.2.0.0-82-hadoop2.6.0.2.2.0.0-2041.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/spark/spark-1.2.0.2.2.0.0-82-bin-2.6.0.2.2.0.0-2041/lib/spark-assembly-1.2.0.2.2.0.0-82-hadoop2.6.0.2.2.0.0-2041.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2015-04-14 22:41:34,839 INFO [Thread-2] spark.SecurityManager (Logging.scala:logInfo(59)) - Changing view acls to: root
2015-04-14 22:41:34,846 INFO [Thread-2] spark.SecurityManager (Logging.scala:logInfo(59)) - Changing modify acls to: root
2015-04-14 22:41:34,847 INFO [Thread-2] spark.SecurityManager (Logging.scala:logInfo(59)) - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
2015-04-14 22:41:35,459 INFO [sparkDriver-akka.actor.default-dispatcher-4] slf4j.Slf4jLogger (Slf4jLogger.scala:applyOrElse(80)) - Slf4jLogger started
2015-04-14 22:41:35,524 INFO [sparkDriver-akka.actor.default-dispatcher-4] Remoting (Slf4jLogger.scala:apply$mcV$sp(74)) - Starting remoting
2015-04-14 22:41:35,754 INFO [sparkDriver-akka.actor.default-dispatcher-4] Remoting (Slf4jLogger.scala:apply$mcV$sp(74)) - Remoting started; listening on addresses :[akka.tcp://sparkDriver@hadoop-node03.mathartsys.com:44295]
2015-04-14 22:41:35,764 INFO [Thread-2] util.Utils (Logging.scala:logInfo(59)) - Successfully started service 'sparkDriver' on port 44295.
2015-04-14 22:41:35,790 INFO [Thread-2] spark.SparkEnv (Logging.scala:logInfo(59)) - Registering MapOutputTracker
2015-04-14 22:41:35,806 INFO [Thread-2] spark.SparkEnv (Logging.scala:logInfo(59)) - Registering BlockManagerMaster
2015-04-14 22:41:35,826 INFO [Thread-2] storage.DiskBlockManager (Logging.scala:logInfo(59)) - Created local directory at /tmp/spark-local-20150414224135-a290
2015-04-14 22:41:35,832 INFO [Thread-2] storage.MemoryStore (Logging.scala:logInfo(59)) - MemoryStore started with capacity 265.4 MB
2015-04-14 22:41:36,535 WARN [Thread-2] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-04-14 22:41:36,823 INFO [Thread-2] spark.HttpFileServer (Logging.scala:logInfo(59)) - HTTP File server directory is /tmp/spark-b963d482-e9be-476b-85b0-94ab6cd8076c
2015-04-14 22:41:36,830 INFO [Thread-2] spark.HttpServer (Logging.scala:logInfo(59)) - Starting HTTP Server
2015-04-14 22:41:36,902 INFO [Thread-2] server.Server (Server.java:doStart(272)) - jetty-8.y.z-SNAPSHOT
2015-04-14 22:41:36,921 INFO [Thread-2] server.AbstractConnector (AbstractConnector.java:doStart(338)) - Started SocketConnector@0.0.0.0:58608
2015-04-14 22:41:36,925 INFO [Thread-2] util.Utils (Logging.scala:logInfo(59)) - Successfully started service 'HTTP file server' on port 58608.
2015-04-14 22:41:37,054 INFO [Thread-2] server.Server (Server.java:doStart(272)) - jetty-8.y.z-SNAPSHOT
2015-04-14 22:41:37,069 INFO [Thread-2] server.AbstractConnector (AbstractConnector.java:doStart(338)) - Started SelectChannelConnector@0.0.0.0:4040
2015-04-14 22:41:37,070 INFO [Thread-2] util.Utils (Logging.scala:logInfo(59)) - Successfully started service 'SparkUI' on port 4040.
2015-04-14 22:41:37,073 INFO [Thread-2] ui.SparkUI (Logging.scala:logInfo(59)) - Started SparkUI at http://hadoop-node03.mathartsys.com:4040
2015-04-14 22:41:38,034 INFO [Thread-2] impl.TimelineClientImpl (TimelineClientImpl.java:serviceInit(285)) - Timeline service address: http://hadoop-node02.mathartsys.com:8188/ws/v1/timeline/
2015-04-14 22:41:38,220 INFO [Thread-2] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at hadoop-node02.mathartsys.com/10.0.0.222:8050
2015-04-14 22:41:38,511 INFO [Thread-2] yarn.Client (Logging.scala:logInfo(59)) - Requesting a new application from cluster with 3 NodeManagers
2015-04-14 22:41:38,536 INFO [Thread-2] yarn.Client (Logging.scala:logInfo(59)) - Verifying our application has not requested more than the maximum memory capability of the cluster (15360 MB per container)
2015-04-14 22:41:38,537 INFO [Thread-2] yarn.Client (Logging.scala:logInfo(59)) - Will allocate AM container, with 896 MB memory including 384 MB overhead
2015-04-14 22:41:38,537 INFO [Thread-2] yarn.Client (Logging.scala:logInfo(59)) - Setting up container launch context for our AM
2015-04-14 22:41:38,544 INFO [Thread-2] yarn.Client (Logging.scala:logInfo(59)) - Preparing resources for our AM container
2015-04-14 22:41:39,125 WARN [Thread-2] shortcircuit.DomainSocketFactory (DomainSocketFactory.java:<init>(116)) - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
2015-04-14 22:41:39,207 INFO [Thread-2] yarn.Client (Logging.scala:logInfo(59)) - Uploading resource file:/opt/spark/spark-1.2.0.2.2.0.0-82-bin-2.6.0.2.2.0.0-2041/lib/spark-assembly-1.2.0.2.2.0.0-82-hadoop2.6.0.2.2.0.0-2041.jar -> hdfs://hadoop-node01.mathartsys.com:8020/user/root/.sparkStaging/application_1428915066363_0013/spark-assembly-1.2.0.2.2.0.0-82-hadoop2.6.0.2.2.0.0-2041.jar
2015-04-14 22:41:40,428 INFO [Thread-2] yarn.Client (Logging.scala:logInfo(59)) - Uploading resource file:/root/hbase/hbase_test2.py -> hdfs://hadoop-node01.mathartsys.com:8020/user/root/.sparkStaging/application_1428915066363_0013/hbase_test2.py
2015-04-14 22:41:40,511 INFO [Thread-2] yarn.Client (Logging.scala:logInfo(59)) - Setting up the launch environment for our AM container
2015-04-14 22:41:40,564 INFO [Thread-2] spark.SecurityManager (Logging.scala:logInfo(59)) - Changing view acls to: root
2015-04-14 22:41:40,564 INFO [Thread-2] spark.SecurityManager (Logging.scala:logInfo(59)) - Changing modify acls to: root
2015-04-14 22:41:40,565 INFO [Thread-2] spark.SecurityManager (Logging.scala:logInfo(59)) - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
2015-04-14 22:41:40,568 INFO [Thread-2] yarn.Client (Logging.scala:logInfo(59)) - Submitting application 13 to ResourceManager
2015-04-14 22:41:40,609 INFO [Thread-2] impl.YarnClientImpl (YarnClientImpl.java:submitApplication(251)) - Submitted application application_1428915066363_0013
2015-04-14 22:41:41,615 INFO [Thread-2] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1428915066363_0013 (state: ACCEPTED)
2015-04-14 22:41:41,621 INFO [Thread-2] yarn.Client (Logging.scala:logInfo(59)) -
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1429022500586
final status: UNDEFINED
tracking URL: http://hadoop-node02.mathartsys.com:8088/proxy/application_1428915066363_0013/
user: root
2015-04-14 22:41:42,624 INFO [Thread-2] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1428915066363_0013 (state: ACCEPTED)
2015-04-14 22:41:43,627 INFO [Thread-2] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1428915066363_0013 (state: ACCEPTED)
2015-04-14 22:41:44,631 INFO [Thread-2] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1428915066363_0013 (state: ACCEPTED)
2015-04-14 22:41:45,635 INFO [Thread-2] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1428915066363_0013 (state: ACCEPTED)
2015-04-14 22:41:46,278 INFO [sparkDriver-akka.actor.default-dispatcher-4] cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(59)) - ApplicationMaster registered as Actor[akka.tcp://sparkYarnAM@hadoop-node05.mathartsys.com:42992/user/YarnAM#708767775]
2015-04-14 22:41:46,284 INFO [sparkDriver-akka.actor.default-dispatcher-4] cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(59)) - Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> hadoop-node02.mathartsys.com, PROXY_URI_BASES -> http://hadoop-node02.mathartsys.com:8088/proxy/application_1428915066363_0013), /proxy/application_1428915066363_0013
2015-04-14 22:41:46,287 INFO [sparkDriver-akka.actor.default-dispatcher-4] ui.JettyUtils (Logging.scala:logInfo(59)) - Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
2015-04-14 22:41:46,638 INFO [Thread-2] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1428915066363_0013 (state: RUNNING)
2015-04-14 22:41:46,639 INFO [Thread-2] yarn.Client (Logging.scala:logInfo(59)) -
client token: N/A
diagnostics: N/A
ApplicationMaster host: hadoop-node05.mathartsys.com
ApplicationMaster RPC port: 0
queue: default
start time: 1429022500586
final status: UNDEFINED
tracking URL: http://hadoop-node02.mathartsys.com:8088/proxy/application_1428915066363_0013/
user: root
2015-04-14 22:41:46,641 INFO [Thread-2] cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(59)) - Application application_1428915066363_0013 has started running.
2015-04-14 22:41:46,795 INFO [Thread-2] netty.NettyBlockTransferService (Logging.scala:logInfo(59)) - Server created on 56053
2015-04-14 22:41:46,797 INFO [Thread-2] storage.BlockManagerMaster (Logging.scala:logInfo(59)) - Trying to register BlockManager
2015-04-14 22:41:46,800 INFO [sparkDriver-akka.actor.default-dispatcher-4] storage.BlockManagerMasterActor (Logging.scala:logInfo(59)) - Registering block manager hadoop-node03.mathartsys.com:56053 with 265.4 MB RAM, BlockManagerId(<driver>, hadoop-node03.mathartsys.com, 56053)
2015-04-14 22:41:46,803 INFO [Thread-2] storage.BlockManagerMaster (Logging.scala:logInfo(59)) - Registered BlockManager
2015-04-14 22:41:55,529 INFO [sparkDriver-akka.actor.default-dispatcher-3] cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(59)) - Registered executor: Actor[akka.tcp://sparkExecutor@hadoop-node06.mathartsys.com:42500/user/Executor#-374031537] with ID 2
2015-04-14 22:41:55,560 INFO [sparkDriver-akka.actor.default-dispatcher-3] util.RackResolver (RackResolver.java:coreResolve(109)) - Resolved hadoop-node06.mathartsys.com to /default-rack
2015-04-14 22:41:55,653 INFO [sparkDriver-akka.actor.default-dispatcher-4] cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(59)) - Registered executor: Actor[akka.tcp://sparkExecutor@hadoop-node04.mathartsys.com:54112/user/Executor#35135131] with ID 1
2015-04-14 22:41:55,655 INFO [sparkDriver-akka.actor.default-dispatcher-4] util.RackResolver (RackResolver.java:coreResolve(109)) - Resolved hadoop-node04.mathartsys.com to /default-rack
2015-04-14 22:41:55,690 INFO [Thread-2] cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(59)) - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
2015-04-14 22:41:55,998 INFO [Thread-2] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(298340) called with curMem=0, maxMem=278302556
2015-04-14 22:41:56,001 INFO [Thread-2] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_0 stored as values in memory (estimated size 291.3 KB, free 265.1 MB)
2015-04-14 22:41:56,160 INFO [Thread-2] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(44100) called with curMem=298340, maxMem=278302556
2015-04-14 22:41:56,161 INFO [Thread-2] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_0_piece0 stored as bytes in memory (estimated size 43.1 KB, free 265.1 MB)
2015-04-14 22:41:56,163 INFO [sparkDriver-akka.actor.default-dispatcher-4] storage.BlockManagerInfo (Logging.scala:logInfo(59)) - Added broadcast_0_piece0 in memory on hadoop-node03.mathartsys.com:56053 (size: 43.1 KB, free: 265.4 MB)
2015-04-14 22:41:56,164 INFO [Thread-2] storage.BlockManagerMaster (Logging.scala:logInfo(59)) - Updated info of block broadcast_0_piece0
2015-04-14 22:41:56,167 INFO [Thread-2] spark.DefaultExecutionContext (Logging.scala:logInfo(59)) - Created broadcast 0 from newAPIHadoopRDD at PythonRDD.scala:516
2015-04-14 22:41:56,204 INFO [Thread-2] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(298388) called with curMem=342440, maxMem=278302556
2015-04-14 22:41:56,205 INFO [Thread-2] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_1 stored as values in memory (estimated size 291.4 KB, free 264.8 MB)
2015-04-14 22:41:56,279 INFO [Thread-2] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(44100) called with curMem=640828, maxMem=278302556
2015-04-14 22:41:56,279 INFO [Thread-2] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_1_piece0 stored as bytes in memory (estimated size 43.1 KB, free 264.8 MB)
2015-04-14 22:41:56,281 INFO [sparkDriver-akka.actor.default-dispatcher-4] storage.BlockManagerInfo (Logging.scala:logInfo(59)) - Added broadcast_1_piece0 in memory on hadoop-node03.mathartsys.com:56053 (size: 43.1 KB, free: 265.3 MB)
2015-04-14 22:41:56,281 INFO [Thread-2] storage.BlockManagerMaster (Logging.scala:logInfo(59)) - Updated info of block broadcast_1_piece0
2015-04-14 22:41:56,283 INFO [Thread-2] spark.DefaultExecutionContext (Logging.scala:logInfo(59)) - Created broadcast 1 from broadcast at PythonRDD.scala:497
2015-04-14 22:41:56,286 INFO [Thread-2] python.Converter (Logging.scala:logInfo(59)) - Loaded converter: org.apache.spark.examples.pythonconverters.ImmutableBytesWritableToStringConverter
2015-04-14 22:41:56,287 INFO [Thread-2] python.Converter (Logging.scala:logInfo(59)) - Loaded converter: org.apache.spark.examples.pythonconverters.HBaseResultToStringConverter
2015-04-14 22:41:56,400 INFO [sparkDriver-akka.actor.default-dispatcher-4] storage.BlockManagerMasterActor (Logging.scala:logInfo(59)) - Registering block manager hadoop-node06.mathartsys.com:39033 with 530.3 MB RAM, BlockManagerId(2, hadoop-node06.mathartsys.com, 39033)
2015-04-14 22:41:56,434 INFO [sparkDriver-akka.actor.default-dispatcher-4] storage.BlockManagerMasterActor (Logging.scala:logInfo(59)) - Registering block manager hadoop-node04.mathartsys.com:33968 with 530.3 MB RAM, BlockManagerId(1, hadoop-node04.mathartsys.com, 33968)
......
1条答案
按热度按时间ecfdbz9o1#