程序:
spark = SparkSession.builder.getOrCreate()
spark.sql("CREATE DATABASE icebergdb2")
spark.sql("USE icebergdb2")
schema = StructType([
StructField("vendor_id", LongType(), True),
StructField("trip_id", LongType(), True),
StructField("trip_distance", FloatType(), True),
StructField("fare_amount", DoubleType(), True),
StructField("store_and_fwd_flag", StringType(), True)
])
spark.sql("CREATE TABLE icebergdb2.iceberg_table (vendor_id LONG, trip_id LONG, trip_distance FLOAT, fare_amount DOUBLE, store_and_fwd_flag STRING) USING iceberg")
执行上述程序时出现此错误:
WARN Shell: Did not find winutils.exe: java.io.FileNotFoundException: Hadoop bin directory does not exist: C:\Users\abc\Desktop\ice\hadoop-3.3.1\etc\hadoop\bin -see https://wiki.apache.org/hadoop/WindowsProblems
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Traceback (most recent call last):
File "C:\Users\abc\Desktop\ice\ice.py", line 24, in <module>
spark.sql("CREATE TABLE icebergdb2.iceberg_table \
File "C:\Users\abc\anaconda3\lib\site-packages\pyspark\sql\session.py", line 1034, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self)
File "C:\Users\abc\anaconda3\lib\site-packages\py4j\java_gateway.py", line 1321, in __call__
return_value = get_return_value(
File "C:\Users\abc\anaconda3\lib\site-packages\pyspark\sql\utils.py", line 190, in deco
return f(*a, **kw)
PS C:\Users\abc\Desktop\ice> SUCCESS: The process with PID 1204 (child process of PID 8780) has been terminated.
SUCCESS: The process with PID 8780 (child process of PID 14136) has been terminated.
SUCCESS: The process with PID 14136 (child process of PID 14132) has been terminated.
使用Apache Spark创建Apache冰山表。
1条答案
按热度按时间baubqpgj1#
欢迎您来到挪威!
看起来你正在尝试在windows上运行Spark。为此,你需要一个名为"winutils"的工具。在你的错误信息中,你可以看到
Did not find winutils.exe: java.io.FileNotFoundException: Hadoop bin directory does not exist
。让我们来看看你必须采取哪些步骤才能在你的电脑上正确运行Spark示例:
spark-3.3.1-bin-hadoop3.tgz
。1.解压缩此文件,并将解压缩的文件夹复制到要放置该文件的位置。例如,
C:\Spark
1.创建一个名为
SPARK_HOME
的环境变量,并将其指向刚刚提取并移动的文件夹(第2点中的C:\Spark
)。1.将
%SPARK_HOME%/bin
添加到您的PATH
环境变量中,这样您就可以运行spark命令,例如spark-submit
和spark-shell
。现在你已经安装好了Spark!要让它在Windows上正常运行,我们现在需要让这个winutils工具工作:
1.从this website下载
winutils.exe
和hadoop.dll
。因为它们是在64位JVM上预编译的,所以你必须确保你安装了64位JDK。当然,如果你愿意,你可以使用其他的,这里只是举个例子。1.将它们移到你喜欢的某个文件夹中,在一个名为
bin
的文件夹中。例如,C:\hadoop\bin
。1.创建
HADOOP_HOME
环境变量,并将其指向放置这两个文件的bin文件夹的父文件夹。在前面的示例中,该变量为C:\hadoop
。1.将
%HADOOP_HOME%/bin
添加到PATH
环境变量中。1.启动一个新的终端会话(git bash,powershell,cmd,...)
当你做了所有这些事情,你的问题就应该消失了!
希望这有帮助:)