pyspark:在运行时导入包

b1zrtrql  于 2021-05-26  发布在  Spark
关注(0)|答案(0)|浏览(352)

我试图连接Pypark到谷歌分析,并遵循了文件https://github.com/crealytics/spark-google-analytics 运行良好。
我的代码段

spark = SparkSession.builder \
    .appName("ga_free") \
    .getOrCreate()
df = spark.read.format("com.crealytics.google.analytics")\
    .option("clientId", "MY_CLIRNT_ID")\
    .option("clientSecret", "SECRET")\
    .option("refreshToken", "REFRESH_TOKEN")\
    .option("ids", "ga:GA_ID")\
    .option("startDate", "today")\
    .option("endDate", "today")\
    .option("metrics","ga:sessions")\
    .load()

我使用以下命令在emr中运行了上面的代码。
spark提交——包com.crealytics:spark-google-analytics_2.11:1.1.2我的\脚本.py
问题是,我在基于spark文档的代码中添加了google分析maven坐标https://spark.apache.org/docs/latest/configuration.html

spark = SparkSession.builder \
    .appName("ga_free") \
    .config("spark.jars.packages","com.crealytics:spark-google-analytics_2.11:1.1.2")
    .getOrCreate()

然后运行spark submit my_script.py,我得到这个错误。

Traceback (most recent call last):
  File "/home/hadoop/ga.py", line 23, in <module>
    .option("metrics","ga:sessions")\
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 172, in load
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o89.load.
: java.lang.ClassNotFoundException: Failed to find data source: com.crealytics.google.analytics. Please find packages at http://spark.apache.org/third-party-projects.html
        at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: com.crealytics.google.analytics.DefaultSource
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:634)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:634)

好像spark还没有选择maven坐标,请帮我解决这个问题。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题