在mac上配置pyspark以访问远程hadoop集群

l7wslrjt  于 2021-05-29  发布在  Hadoop
关注(0)|答案(0)|浏览(317)

我需要在我的机器上设置pyspark来访问和读取远程hadoop集群上的数据,但是我遇到了一些问题。
这是我遵循的步骤

  1. brew install apache-spark 2) export SPARK_HOME=/usr/local/Cellar/apache-spark/1.6.1 export PYTHONPATH=$SPARK_HOME/libexec/python:$SPARK_HOME/libexec/python/build:$PYTHONPATH PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH 3)
export HADOOP_USER_NAME=hdfs
export HADOOP_CONF_DIR=yarnconfig

yarnconfig 我有这个 yarn-site.xml ```


yarn.resourcemanager.hostname
{Hadoop_Cluster_IP}


yarn.resourcemanager.address
${yarn.resourcemanager.hostname}:8050

在这里 `{Hadoop_Cluster_IP}` 是我尝试连接到的hadoop集群的ip地址的占位符,出于安全原因,我不显示它。
然后,在python shell中

from pyspark import SparkContext, SparkConf
conf = SparkConf().setMaster("local").setAppName("LogParser")

sc=sparkcontext(conf=conf)
但我得到以下错误信息

/usr/local/Cellar/apache-spark/1.6.1/bin/load-spark-env.sh: line 2: /usr/local/Cellar/apache-spark/1.6.1/libexec/bin/load-spark-env.sh: Permission denied
/usr/local/Cellar/apache-spark/1.6.1/bin/load-spark-env.sh: line 2: exec: /usr/local/Cellar/apache-spark/1.6.1/libexec/bin/load-spark-env.sh: cannot execute: Undefined error: 0
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/Cellar/apache-spark/1.6.1/libexec/python/pyspark/conf.py", line 104, in init
SparkContext._ensure_initialized()
File "/usr/local/Cellar/apache-spark/1.6.1/libexec/python/pyspark/context.py", line 245, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway()
File "/usr/local/Cellar/apache-spark/1.6.1/libexec/python/pyspark/java_gateway.py", line 94, in launch_gateway
raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number

你知道哪里出了问题吗?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题