hdp3.1.4-spark2使用spark submit和pyspark shell时出现配置单元仓库连接器错误:keeperrorcode=connectionloss

edqdpe6u  于 2021-05-27  发布在  Hadoop
关注(0)|答案(0)|浏览(451)

环境:
hdp 3.1.4-配置和测试的hive服务器2-测试和工作
hive server 2 llap-根据文档配置已测试和工作的spark,以使用hive warehouse connector(hwc)
ApacheZeppelin-spark2解释器配置为使用hwc
正在尝试执行以下脚本:

from pyspark.sql import SparkSession
from pyspark_llap import HiveWarehouseSession

# Create spark session

spark = SparkSession.builder.appName("LLAP Test - CLI").enableHiveSupport().getOrCreate()

# Create HWC session

hive = HiveWarehouseSession.session(spark).userPassword('hive','hive').build()

# Execute a query to read from Spark using HWC

hive.executeQuery("select * from wifi_table where partit='2019-12-02'").show(20)

问题:当使用spark submit提交应用程序或使用带有上述脚本的pyspark shell(或使用hivewarehousesession执行查询的任何脚本)时,spark作业会卡住,引发异常:java.lang.runtimeexception:java.io.ioexception:shadecutor.org.apache.curator.curatorconnectionlossexception:keeperrorcode=connectionloss
要执行的命令如下:

$ /usr/hdp/current/spark2-client/bin/spark-submit --master yarn --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.1.4.0-315.jar --py-files /usr/hdp/current/hive_warehouse_connector/pyspark_hwc-1.0.0.3.1.4.0-315.zip spark_compare_test.py

下面是stacktrace:

[...]
20/01/03 12:39:55 INFO SparkContext: Starting job: showString at NativeMethodAccessorImpl.java:0
20/01/03 12:39:56 INFO DAGScheduler: Got job 0 (showString at NativeMethodAccessorImpl.java:0) with 1 output partitions
20/01/03 12:39:56 INFO DAGScheduler: Final stage: ResultStage 0 (showString at NativeMethodAccessorImpl.java:0)
20/01/03 12:39:56 INFO DAGScheduler: Parents of final stage: List()
20/01/03 12:39:56 INFO DAGScheduler: Missing parents: List()
20/01/03 12:39:56 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[3] at showString at NativeMethodAccessorImpl.java:0), which has no missing parents
20/01/03 12:39:56 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 9.5 KB, free 366.3 MB)
20/01/03 12:39:56 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.6 KB, free 366.3 MB)
20/01/03 12:39:56 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on EDGE01.machine:38050 (size: 3.6 KB, free: 366.3 MB)
20/01/03 12:39:56 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1039
20/01/03 12:39:56 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at showString at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0))
20/01/03 12:39:56 INFO YarnScheduler: Adding task set 0.0 with 1 tasks
20/01/03 12:39:56 WARN TaskSetManager: Stage 0 contains a task of very large size (465 KB). The maximum recommended task size is 100 KB.
20/01/03 12:39:56 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, DN02.machine, executor 2, partition 0, NODE_LOCAL, 476705 bytes)
20/01/03 12:39:56 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on DN02.machine:41521 (size: 3.6 KB, free: 366.3 MB)
20/01/03 12:42:08 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, DN02.machine, executor 2): java.lang.RuntimeException: java.io.IOException: shadecurator.org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
    at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReaderFactory.createDataReader(HiveWarehouseDataReaderFactory.java:66)
    at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD.compute(DataSourceRDD.scala:42)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:109)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: shadecurator.org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
    at org.apache.hadoop.hive.registry.impl.ZkRegistryBase.ensureInstancesCache(ZkRegistryBase.java:619)
    at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getInstances(LlapZookeeperRegistryImpl.java:422)
    at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getInstances(LlapZookeeperRegistryImpl.java:63)
    at org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getInstances(LlapRegistryService.java:181)
    at org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getInstances(LlapRegistryService.java:177)
    at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getServiceInstanceForHost(LlapBaseInputFormat.java:415)
    at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getServiceInstance(LlapBaseInputFormat.java:397)
    at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getRecordReader(LlapBaseInputFormat.java:160)
    at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.getRecordReader(HiveWarehouseDataReader.java:72)
    at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.<init>(HiveWarehouseDataReader.java:50)
    at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReaderFactory.getDataReader(HiveWarehouseDataReaderFactory.java:72)
    at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReaderFactory.createDataReader(HiveWarehouseDataReaderFactory.java:64)
    ... 18 more
Caused by: shadecurator.org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
    at shadecurator.org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:225)
    at shadecurator.org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:94)
    at shadecurator.org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:117)
    at shadecurator.org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:489)
    at shadecurator.org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:199)
    at shadecurator.org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:193)
    at shadecurator.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
    at shadecurator.org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:190)
    at shadecurator.org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:175)
    at shadecurator.org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:32)
    at shadecurator.org.apache.curator.framework.imps.CuratorFrameworkImpl.createContainers(CuratorFrameworkImpl.java:194)
    at shadecurator.org.apache.curator.framework.EnsureContainers.internalEnsure(EnsureContainers.java:61)
    at shadecurator.org.apache.curator.framework.EnsureContainers.ensure(EnsureContainers.java:53)
    at shadecurator.org.apache.curator.framework.recipes.cache.PathChildrenCache.ensurePath(PathChildrenCache.java:576)
    at shadecurator.org.apache.curator.framework.recipes.cache.PathChildrenCache.rebuild(PathChildrenCache.java:326)
    at shadecurator.org.apache.curator.framework.recipes.cache.PathChildrenCache.start(PathChildrenCache.java:303)
    at org.apache.hadoop.hive.registry.impl.ZkRegistryBase.ensureInstancesCache(ZkRegistryBase.java:597)
    ... 29 more
[...]

我尝试了以下方法,但没有任何效果:
已检查zookeeper运行状况和连接限制
更改了zookeeper主机
将zookeeper超时时间增加到10秒、120秒和600秒,但没有效果
尝试在多个节点上提交应用程序,错误仍然存在
还有另一个奇怪的行为,在zeppelin spark2解释器上运行脚本时没有错误,hwc工作正常。我已经比较了环境,并且在主要变量上没有配置不匹配。
在这一点上,我卡住了,不知道在哪里寻找进一步的故障排除。我可以按要求添加更多信息。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题