sparkMesos团簇失去了工人之间的连通性

muk1a3rh  于 2021-05-24  发布在  Spark
关注(0)|答案(0)|浏览(319)

我有一个主设备和9个从设备,集群中总共有30 gb ram。
这种情况并不总是发生,但我失去了工人之间的联系。
数据量很低<500 mb,我可以用笔记本电脑中的docker集群运行查询没有问题,这里的问题/方法是什么?
在复杂过滤器的某个点上,这个错误出现在stderr上:

20/10/08 12:27:39 ERROR ShuffleBlockFetcherIterator: Failed to get block(s) from XXXXXX:34483
java.io.IOException: Failed to connect to XXXXX/XXXXX:34483
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
    at org.apache.spark.network.netty.NettyBlockTransferService$$anon$2.createAndStart(NettyBlockTransferService.scala:114)
    at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:141)
    at org.apache.spark.network.shuffle.RetryingBlockFetcher.lambda$initiateRetry$0(RetryingBlockFetcher.java:169)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    at java.lang.Thread.run(Thread.java:748)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: dub901mps501.kubikdata.aws/172.31.6.10:34483
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714)
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    ... 2 more
Caused by: java.net.ConnectException: Connection refused

在看了失败的阶段之后,它看起来总是在计数。这到底是怎么回事?数据集很小,是内存问题吗?无序读写<1k.b

count at NativeMethodAccessorImpl.java:0 +details

org.apache.spark.sql.Dataset.count(Dataset.scala:2835)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

我检查了端口,它们已经打开了。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题