我很抱歉问一个我以前在这里见过的问题,但我所得到的答案似乎都不能解决这个问题。我遵循了在本地机器上运行pyspark的安装文档。完成后,我将尝试使用
# Start pyspark via provided command
import pyspark
# Below code is Spark 2+
spark = pyspark.sql.SparkSession.builder.appName('test').getOrCreate()
spark.range(10).collect()
但我不断得到以下错误:
/Users/usr123/opt/anaconda3/lib/python3.7/site-packages/pyspark/bin/spark-class: line 71: /usr/bin/java/bin/java: Not a directory
Traceback (most recent call last):
File "test.py", line 5, in <module>
spark = pyspark.sql.SparkSession.builder.appName('test').getOrCreate()
File "/Users/usr123/opt/anaconda3/lib/python3.7/site-packages/pyspark/sql/session.py", line 173, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "/Users/usr123/opt/anaconda3/lib/python3.7/site-packages/pyspark/context.py", line 349, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "/Users/usr123/opt/anaconda3/lib/python3.7/site-packages/pyspark/context.py", line 115, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "/Users/usr123/opt/anaconda3/lib/python3.7/site-packages/pyspark/context.py", line 298, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/Users/usr123/opt/anaconda3/lib/python3.7/site-packages/pyspark/java_gateway.py", line 94, in launch_gateway
raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number
有没有人找到一个好办法来确保这个问题得到纠正?我有什么明显的遗漏吗?
1条答案
按热度按时间k3bvogb11#
我们也遇到了类似的问题,对于我们来说,将python版本降低到3.6是为了解决这些问题,在我们的例子中,这似乎是一个不兼容conda环境的例子。你想运行哪种版本的spark?它可能是2.1、2.3、2.4之间的实质性差异。