apache sparksql无法解析在sqltext中创建的给定输入列

mi7gmzs6  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(452)

我尝试运行sparksql语句,并尝试在执行聚合时执行一个简单的group by;它抱怨在我在模式中提供的给定输入列中找不到month列,但是根据教程,它们能够运行给定的代码。
代码:

StructField[] fields = new StructField[]{
            new StructField("level", DataTypes.StringType, false, Metadata.empty()),
            new StructField("datetime", DataTypes.StringType, false, Metadata.empty())
    };

    StructType schema = new StructType(fields);
    Dataset<Row> dateSet = spark.createDataFrame(inMemory, schema);
    dateSet.createOrReplaceTempView("logging_level");
    Dataset<Row> results = spark.sql("select level, date_format(datetime, 'MMMM') as month, count(1) as total from logging_level group by level, month");

堆栈跟踪:

Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '`month`' given input columns: [level, datetime]; line 1 pos 107
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74)....
w1jd8yoj

w1jd8yoj1#

不能重用在中定义的别名 select 合同条款 group by 条款。您需要重复以下表达式:

select level, date_format(datetime, 'MMMM') as month, count(*) as total 
from logging_level 
group by level, date_format(datetime, 'MMMM')

注意,我替换了 count(1)count(*) :它效率更高,并提供相同的结果。
许多数据库支持位置参数。我认为spark就是其中之一,所以:

select level, date_format(datetime, 'MMMM') as month, count(*) as total 
from logging_level 
group by 1, 2

相关问题