hadoop 错误:Py4j.Py4JException:方法sql([class java.lang.String,class [Ljava.lang.Object;])不存在

2hh7jdfx  于 2023-11-16  发布在  Hadoop
关注(0)|答案(1)|浏览(337)

我正在使用Pyspark 3.4.1,java 8,hadoop 3.4.0,scala 2.12.17,python 3.11.4,这是我在vscode中的代码:

def calculating_click(df):
    click_data = df.filter((df.custom_track == "click"))
    click_data = click_data.na.fill({'bid':0})
    click_data = click_data.na.fill({'job_id':0})
    click_data = click_data.na.fill({'publisher_id':0})
    click_data = click_data.na.fill({'group_id':0})
    click_data = click_data.na.fill({'campaign_id':0})
    click_data = click_data.na.fill({'campaign_id':0})
    click_data.registerTempTable('clicks') #name temporary table 'clicks'
    click_output = spark.sql("""select job_id,date(ts) as date,hour(ts) as hour,publisher_id,campaign_id,group_id, avg(bid) as bid_set,count(*) as clicks, sum(bid) as spend_hour from clicks`group by job_id, date(ts), hour(ts),publisher_id, campaign_id, group_id """)`

字符串
我得到这个错误:

Py4JError: An error occurred while calling o28.sql. Trace:
py4j.Py4JException: Method sql([class java.lang.String, class [Ljava.lang.Object;]) does not exist
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:321)
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:329)
    at py4j.Gateway.invoke(Gateway.java:274)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    at java.lang.Thread.run(Thread.java:750)


enter image description hereenter image description hereenter image description here有人能帮我解决这个问题吗?我正在尝试使用pyspark,但每次都出错,我应该使用哪个版本的spark,hadoop,java?

rseugnpd

rseugnpd1#

我今天面临着同样的问题,虽然我使用的是容器化的构建,我试图看看是否有一个问题与Spark:最新的形象,修复版本到3.5.0检查:(https://hub.docker.com/layers/apache/spark/3.5.0/images/sha256-a4a48089219912a8a87d7928541d576df00fc8d95f18a1509624e32b0e5c97d7?context=explore
我确认它不能在3.5.0的spark-submit端工作(集群模式)。看起来不同版本的Py 4J有兼容性问题。
environment.yml(conda):

name: spark-submit-env
channels:
  - conda-forge
  - defaults
dependencies:
  - python==3.11
  - pandas==2.0.3
  - pyspark==3.5.0
  - pyarrow==12.0.1

字符串
Dockerfile:

FROM apache/spark:3.5.0
USER root
ENV HADOOP_CONF_DIR=/conf
ENV SPARK_CONF_DIR=/conf

相关问题