java 遇到一个错误,无法在pyspark上运行程序

vngu2lb8  于 12个月前  发布在  Java
关注(0)|答案(5)|浏览(141)

我在pyspark上输入了这些命令

In [1]: myrdd = sc.textFile("Cloudera-cdh5.repo")
 In [2]: myrdd.map(lambda x:x.upper()).collect()

字符串
当我执行'myrdd.map(lambda x:x.upper()).collect()'时,我遇到了一个错误
以下是错误信息

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
    : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, tiger): java.io.IOException: Cannot run program "/usr/local/bin/python3": error=2, No such file or directory
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
        at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:160)
        at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:86)
        at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)
        at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:135)
        at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:73)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
    Caused by: java.io.IOException: error=2, No such file or directory
        at java.lang.UNIXProcess.forkAndExec(Native Method)
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:186)
        at java.lang.ProcessImpl.start(ProcessImpl.java:130)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
        ... 13 more


磁盘上存在文件/usr/local/bin/python3
如何解决以上错误?

xuo3flqw

xuo3flqw1#

你需要给予访问权限在/usr/local/bin/python3这个路径,你可以使用命令sudo chmod 777 /usr/local/bin/python3/* .
我认为这个问题是由变量PYSPARK_PYTHON引起的,它用于为每个节点指向Python的位置,你可以使用下面的命令

export PYSPARK_PYTHON=/usr/local/bin/python3

字符串

2vuwiymt

2vuwiymt2#

我使用的是Windows 10,也面临着同样的问题。我可以简单地通过复制python.exe并将其重命名为python3.exe来修复它,并且在环境变量路径中设置python.exe文件夹。

w1e3prcc

w1e3prcc3#

更“愚蠢”的是,不是权限问题,可能只是你没有安装python3或者它的路径变量可能是错误的。

2w2cym1i

2w2cym1i4#

也可以将python设置为python3

sudo alternatives --set python /usr/bin/python3
python --version

字符串

vx6bjr1n

vx6bjr1n5#

对于使用Windows的用户:在conf目录中创建一个spark-env.cmd文件,并将以下行放在spark-env.cmd文件中。

set PYSPARK_PYTHON=C:\Python39\python.exe

字符串
This stack-overflow answer explains about setting environment variables for pyspark in windows

相关问题