pyspark:connectionreseterror:[winerror 10054]远程主机已强制关闭现有连接

tv6aics1  于 2021-05-27  发布在  Spark
关注(0)|答案(0)|浏览(636)

我对Pypark还很陌生。在pycharm中运行下面的代码时,我得到了所需的预期输出。但我的错误率越来越低

Traceback (most recent call last):
  File "C:\Study\Spark\spark\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1067, in start
ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:49748)
Traceback (most recent call last):
  File "C:\Study\Spark\spark\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 929, in _get_connection
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Study\Spark\spark\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1067, in start
ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it

**Process finished with exit code 0**

正如您在最后一行中所看到的,进程以退出代码0结束,我也得到了预期的输出

这是我的代码示例
python-3.7版
Spark-2.4.5

def func(row):
    temp=row.asDict()
    temp["concat_val"]="|".join([str(x) for x in row])
    put=Row(**temp)
    return put

if __name__ == "__main__":
     spark = SparkSession\
        .builder.\
        master("local[*]")\
        .appName("PythonWordCount")\
        .getOrCreate()

    data1=spark.createDataFrame(
        [
            ("1", 'foo'),  
            ("2", 'bar'),
        ],
        ['id', 'txt'] 
    row_rdd = data1.rdd.map(func)
    print(row_rdd.collect())
    concat_df = row_rdd.toDF()
    hash_df = concat_df.withColumn("hash_id", md5(F.col("concat_val")))
    hash_df.show()

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题