Apache Spark Hadoop bin目录不存在

uxhixvfz  于 2022-12-23  发布在  Apache
关注(0)|答案(1)|浏览(399)

程序:

spark = SparkSession.builder.getOrCreate()
spark.sql("CREATE DATABASE icebergdb2")
spark.sql("USE icebergdb2")
schema = StructType([
  StructField("vendor_id", LongType(), True),
  StructField("trip_id", LongType(), True),
  StructField("trip_distance", FloatType(), True),
  StructField("fare_amount", DoubleType(), True),
  StructField("store_and_fwd_flag", StringType(), True)
])
spark.sql("CREATE TABLE icebergdb2.iceberg_table (vendor_id LONG, trip_id LONG, trip_distance FLOAT, fare_amount DOUBLE, store_and_fwd_flag STRING) USING iceberg")

执行上述程序时出现此错误:

WARN Shell: Did not find winutils.exe: java.io.FileNotFoundException: Hadoop bin directory does not exist: C:\Users\abc\Desktop\ice\hadoop-3.3.1\etc\hadoop\bin -see https://wiki.apache.org/hadoop/WindowsProblems
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Traceback (most recent call last):
  File "C:\Users\abc\Desktop\ice\ice.py", line 24, in <module>
    spark.sql("CREATE TABLE icebergdb2.iceberg_table \
  File "C:\Users\abc\anaconda3\lib\site-packages\pyspark\sql\session.py", line 1034, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery), self)
  File "C:\Users\abc\anaconda3\lib\site-packages\py4j\java_gateway.py", line 1321, in __call__
    return_value = get_return_value(
  File "C:\Users\abc\anaconda3\lib\site-packages\pyspark\sql\utils.py", line 190, in deco
    return f(*a, **kw)  
PS C:\Users\abc\Desktop\ice> SUCCESS: The process with PID 1204 (child process of PID 8780) has been terminated.
SUCCESS: The process with PID 8780 (child process of PID 14136) has been terminated.
SUCCESS: The process with PID 14136 (child process of PID 14132) has been terminated.

使用Apache Spark创建Apache冰山表。

baubqpgj

baubqpgj1#

欢迎您来到挪威!
看起来你正在尝试在windows上运行Spark。为此,你需要一个名为"winutils"的工具。在你的错误信息中,你可以看到Did not find winutils.exe: java.io.FileNotFoundException: Hadoop bin directory does not exist
让我们来看看你必须采取哪些步骤才能在你的电脑上正确运行Spark示例:

  1. Download Spark使用Hadoop预构建。例如,spark-3.3.1-bin-hadoop3.tgz
    1.解压缩此文件,并将解压缩的文件夹复制到要放置该文件的位置。例如,C:\Spark
    1.创建一个名为SPARK_HOME的环境变量,并将其指向刚刚提取并移动的文件夹(第2点中的C:\Spark)。
    1.将%SPARK_HOME%/bin添加到您的PATH环境变量中,这样您就可以运行spark命令,例如spark-submitspark-shell
    现在你已经安装好了Spark!要让它在Windows上正常运行,我们现在需要让这个winutils工具工作:
    1.从this website下载winutils.exehadoop.dll。因为它们是在64位JVM上预编译的,所以你必须确保你安装了64位JDK。当然,如果你愿意,你可以使用其他的,这里只是举个例子。
    1.将它们移到你喜欢的某个文件夹中,在一个名为bin的文件夹中。例如,C:\hadoop\bin
    1.创建HADOOP_HOME环境变量,并将其指向放置这两个文件的bin文件夹的父文件夹。在前面的示例中,该变量为C:\hadoop
    1.将%HADOOP_HOME%/bin添加到PATH环境变量中。
    1.启动一个新的终端会话(git bash,powershell,cmd,...)
    当你做了所有这些事情,你的问题就应该消失了!
    希望这有帮助:)

相关问题