我对Pypark还很陌生。在pycharm中运行下面的代码时,我得到了所需的预期输出。但我的错误率越来越低
Traceback (most recent call last):
File "C:\Study\Spark\spark\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1067, in start
ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:49748)
Traceback (most recent call last):
File "C:\Study\Spark\spark\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 929, in _get_connection
IndexError: pop from an empty deque
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Study\Spark\spark\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1067, in start
ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it
**Process finished with exit code 0**
正如您在最后一行中所看到的,进程以退出代码0结束,我也得到了预期的输出
这是我的代码示例
python-3.7版
Spark-2.4.5
def func(row):
temp=row.asDict()
temp["concat_val"]="|".join([str(x) for x in row])
put=Row(**temp)
return put
if __name__ == "__main__":
spark = SparkSession\
.builder.\
master("local[*]")\
.appName("PythonWordCount")\
.getOrCreate()
data1=spark.createDataFrame(
[
("1", 'foo'),
("2", 'bar'),
],
['id', 'txt']
row_rdd = data1.rdd.map(func)
print(row_rdd.collect())
concat_df = row_rdd.toDF()
hash_df = concat_df.withColumn("hash_id", md5(F.col("concat_val")))
hash_df.show()
暂无答案!
目前还没有任何答案,快来回答吧!