python not found error in pyspark shell on windows 10

eit6fx6z  于 2023-04-13  发布在  Windows
关注(0)|答案(2)|浏览(113)

我试图在windows上安装pyspark 10.当我尝试创建一个 Dataframe 我收到错误消息,错误消息如下:

Python was not found; run without arguments to install from the Microsoft Store, or disable this shortcut from Settings > Manage App Execution Aliases.
21/07/21 21:53:00 WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped
21/07/21 21:53:07 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.SparkException: Python worker failed to connect back.
        at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:182)
        at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:107)
        at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:119)
        at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:145)
        at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: Accept timed out
        at java.net.DualStackPlainSocketImpl.waitForNewConnection(Native Method)
        at java.net.DualStackPlainSocketImpl.socketAccept(DualStackPlainSocketImpl.java:131)
        at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:535)
        at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:189)
        at java.net.ServerSocket.implAccept(ServerSocket.java:545)
        at java.net.ServerSocket.accept(ServerSocket.java:513)
        at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:174)
        ... 29 more

我正在使用以下版本进行安装:

python - 3.9
java -1.8
pyspark - 3.1.2

我的SPARK_HOMEC:\spark\spark-3.1.2-bin-hadoop3.2

qco9c6ql

qco9c6ql1#

对于我来说,删除Python的应用执行别名帮助:
按Windows键(或开始菜单按钮)并键入“应用程序执行别名”
在对话框中有禁用

  • 应用程序安装程序python
  • 应用安装程序python3:

camsedfj

camsedfj2#

在我的情况下,即使遵循@juergi的解决方案,问题也没有消失。然而,将python.exe复制到python3.exe就解决了这个问题。
这是完整的流程

(personal3_11) PS C:\Users\amit_tendulkar> $env:JAVA_HOME="C:\Program Files\Java\jdk-19"
(personal3_11) PS C:\Users\amit_tendulkar> pyspark
Missing Python executable 'python3', defaulting to 'D:\venv\personal3_11\Scripts\..' for SPARK_HOME environment variable. Please install Python or specify the correct Python executable in PYSPARK_DRIVER_PYTHON or PYSPARK_PYTHON environment variable to detect SPARK_HOME safely.
The system cannot find the path specified.
The system cannot find the path specified.
(personal3_11) PS C:\Users\amit_tendulkar> dir D:\venv\personal3_11\Scripts\..

    Directory: D:\venv\personal3_11

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----           3/31/2023  3:07 PM                Include
d----           3/31/2023  3:07 PM                Lib
d----           4/12/2023  9:37 PM                Scripts
d----           4/12/2023  9:36 PM                share
-a---           3/31/2023  3:07 PM            332 pyvenv.cfg

(personal3_11) PS C:\Users\amit_tendulkar> dir D:\venv\personal3_11\Scripts\

    Directory: D:\venv\personal3_11\Scripts

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a---           3/31/2023  3:07 PM           2076 activate
-a---           3/31/2023  3:07 PM           3394 activate-global-python-argcomplete
-a---           3/31/2023  3:07 PM           1005 activate.bat
-a---           3/31/2023  3:07 PM          26195 Activate.ps1
-a---           2/11/2023  1:23 AM           1089 beeline
-a---           2/11/2023  1:23 AM           1064 beeline.cmd
-a---           3/31/2023  3:07 PM            393 deactivate.bat
-a---           2/11/2023  1:23 AM          11283 docker-image-tool.sh
-a---           4/12/2023  9:36 PM           4115 find_spark_home.py
-a---           2/11/2023  1:23 AM           1935 find-spark-home
-a---           2/11/2023  1:23 AM           2685 find-spark-home.cmd
-a---           2/11/2023  1:23 AM           2337 load-spark-env.cmd
-a---           2/11/2023  1:23 AM           2678 load-spark-env.sh
-a---           4/12/2023  9:37 PM         108398 pip.exe
-a---           4/12/2023  9:37 PM         108398 pip3.11.exe
-a---           4/12/2023  9:37 PM         108398 pip3.exe
-a---           3/31/2023  3:07 PM         108383 pipx.exe
-a---           2/11/2023  1:23 AM           2636 pyspark
-a---           2/11/2023  1:23 AM           1170 pyspark.cmd
-a---           2/11/2023  1:23 AM           1542 pyspark2.cmd
-a---           3/31/2023  3:07 PM           2629 python-argcomplete-check-easy-install-script
-a---           3/31/2023  3:07 PM         270616 python.exe
-a---           3/31/2023  3:07 PM         259344 pythonw.exe
-a---           3/31/2023  3:07 PM           1977 register-python-argcomplete
-a---           2/11/2023  1:23 AM           1030 run-example
-a---           2/11/2023  1:23 AM           1223 run-example.cmd
-a---           2/11/2023  1:23 AM           3539 spark-class
-a---           2/11/2023  1:23 AM           1180 spark-class.cmd
-a---           2/11/2023  1:23 AM           2812 spark-class2.cmd
-a---           2/11/2023  1:23 AM           3122 spark-shell
-a---           2/11/2023  1:23 AM           1178 spark-shell.cmd
-a---           2/11/2023  1:23 AM           1818 spark-shell2.cmd
-a---           2/11/2023  1:23 AM           1065 spark-sql
-a---           2/11/2023  1:23 AM           1173 spark-sql.cmd
-a---           2/11/2023  1:23 AM           1118 spark-sql2.cmd
-a---           2/11/2023  1:23 AM           1040 spark-submit
-a---           2/11/2023  1:23 AM           1180 spark-submit.cmd
-a---           2/11/2023  1:23 AM           1155 spark-submit2.cmd
-a---           2/11/2023  1:23 AM           1039 sparkR
-a---           2/11/2023  1:23 AM           1168 sparkR.cmd
-a---           2/11/2023  1:23 AM           1097 sparkR2.cmd
-a---           3/31/2023  3:07 PM         108396 userpath.exe
-a---           4/12/2023  9:37 PM         108385 wheel.exe

(personal3_11) PS C:\Users\amit_tendulkar> python3
python3: The term 'python3' is not recognized as a name of a cmdlet, function, script file, or executable program.
Check the spelling of the name, or if a path was included, verify that the path is correct and try again.
(personal3_11) PS C:\Users\amit_tendulkar> copy D:\venv\personal3_11\Scripts\python.exe D:\venv\personal3_11\Scripts\python3.exe
(personal3_11) PS C:\Users\amit_tendulkar> pyspark
Python 3.11.2 (tags/v3.11.2:878ead1, Feb  7 2023, 16:38:35) [MSC v.1934 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
23/04/12 21:47:32 WARN Shell: Did not find winutils.exe: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/04/12 21:47:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.3.2
      /_/

Using Python version 3.11.2 (tags/v3.11.2:878ead1, Feb  7 2023 16:38:35)
Spark context Web UI available at http://PSL-HDVL6Q3.persistent.co.in:4040
Spark context available as 'sc' (master = local[*], app id = local-1681316253717).
SparkSession available as 'spark'.
>>>

相关问题