如何正确设置pyspark-snowflake连接的变量?

yqyhoc1h  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(516)

我正在使用doc并尝试运行一个简单的脚本:https://docs.snowflake.com/en/user-guide/spark-connector-use.html

Py4JJavaError: An error occurred while calling o37.load.
: java.lang.ClassNotFoundException: Failed to find data source: net.snowflake.spark.snowflake.

我的代码在下面。我还尝试设置config选项,设置jdbc和spark snowflake jars的路径 /Users/Hana/spark-sf/ 但运气不好。

from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *
from pyspark import SparkConf, SparkContext

spark = SparkSession \
    .builder \
    .appName("Python Spark SQL basic example") \
    .config('spark.jars','/Users/Hana/spark-sf/snowflake-jdbc-3.12.9.jar,/Users/Hana/spark-sf/spark-snowflake_2.12-2.8.1-spark_3.0.jar') \
    .getOrCreate()

# Set options below

sfOptions = {
  "sfURL" : "<account_name>.snowflakecomputing.com",
  "sfUser" : "<user_name>",
  "sfPassword" : "<password>",
  "sfDatabase" : "<database>",
  "sfSchema" : "<schema>",
  "sfWarehouse" : "<warehouse>"
}

SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"

df = spark.read.format(SNOWFLAKE_SOURCE_NAME) \
  .options(**sfOptions) \
  .option("query",  "select * from table limit 200") \
  .load()

df.show()

如何正确设置变量?哪些是需要设置的?如果有人能帮我列出这些步骤,我将不胜感激!

7rtdyuoh

7rtdyuoh1#

你能试着用“雪花”格式吗
所以你的Dataframe

df = spark.read.format("snowflake") \
  .options(**sfOptions) \
  .option("query",  "select * from table limit 200") \
  .load()

或设置 SNOWFLAKE_SOURCE_NAME 变量到

SNOWFLAKE_SOURCE_NAME = "snowflake"

相关问题