不在windows10上安装spark我能在本地运行pyspark吗?

lb3vh1jj  于 2022-12-17  发布在  Spark
关注(0)|答案(2)|浏览(209)

我需要创建一个使用pyspark的概念证明,我想知道是否有一种方法可以安装它,并通过pip使用它,而不必安装和配置spark本身。我读到一些答案,建议新版本的pyspark允许您在独立模式下运行它,而不需要完整的spark,但当我尝试,我得到以下错误:

Traceback (most recent call last):
  File "C:\Users\320181940\PycharmProjects\meetup\main.py", line 8, in <module>
    sc = SparkContext("local", "meetup_etl")
  File "C:\Users\320181940\PycharmProjects\meetup\venv\lib\site-packages\pyspark\context.py", line 144, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
  File "C:\Users\320181940\PycharmProjects\meetup\venv\lib\site-packages\pyspark\context.py", line 331, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway(conf)
  File "C:\Users\320181940\PycharmProjects\meetup\venv\lib\site-packages\pyspark\java_gateway.py", line 101, in launch_gateway
    proc = Popen(command, **popen_kwargs)
  File "C:\Python310\lib\subprocess.py", line 966, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Python310\lib\subprocess.py", line 1435, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

我使用pip安装了pyspark3.1.3,我正试图在Windows10上运行它。

zc0qhyus

zc0qhyus1#

您需要安装java并将JAVA_HOME添加到环境变量路径中

woobm2wo

woobm2wo2#

启动一个python解释器,创建一个spark会话并运行你的代码,下面是一个例子:

from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").getOrCreate()
df = spark.createDataFrame(
        [["I'm ready!"], ["If I could put into words how much I love waking up at 6 am on Mondays I would."]]).toDF(
        "text")
df.show()

此外,请确保按照gist中的指定设置HADOOP_HOME

相关问题