Apache Spark Py4j.protocol.Py4JJavaError:调用o69.sql. java.lang时出错,AssertionError:Assert失败

5jdjgkvh  于 2023-10-23  发布在  Apache
关注(0)|答案(2)|浏览(191)

bounty将在6天内到期。回答此问题有资格获得+50声誉奖励。Arya正在寻找一个来自信誉良好的来源的答案:需要任何建议,因为无法在谷歌或Cloudera网站上获得任何信息

运行简单的Spark SQL

spark.sql("select count(1) from edw.result_base").show()

.我得到这个错误

py4j.protocol.Py4JJavaError: An error occurred while calling o69.sql. java.lang.AssertionError: assertion failed

表格格式为parquet,Spark版本为2.4.8
我尝试设置下面的属性,仍然得到一个错误。

spark.conf.set("spark.sql.hive.convertMetastoreOrc", "true")

错误的完整堆栈跟踪如下所示:

java.lang.AssertionError: assertion failed
      at scala.Predef$.assert(Predef.scala:208)
      at org.apache.spark.sql.hive.HiveMetastoreCatalog.convertToLogicalRelation(HiveMetastoreCatalog.scala:261)
      at org.apache.spark.sql.hive.HiveMetastoreCatalog.convert(HiveMetastoreCatalog.scala:137)
      at org.apache.spark.sql.hive.RelationConversions$$anonfun$apply$4.applyOrElse(HiveStrategies.scala:220)
      at org.apache.spark.sql.hive.RelationConversions$$anonfun$apply$4.applyOrElse(HiveStrategies.scala:207)
      at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$2(AnalysisHelper.scala:108)
      at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
      at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:108)
      at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
      at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106)
      at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:104)
      at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29)
      at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$4(AnalysisHelper.scala:113)
      at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:376)
      at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:214)
      at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:374)
      at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:327)
      at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:113)
      at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
      at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106)
      at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:104)
      at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29)
      at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:73)
      at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:72)
      at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:29)
      at org.apache.spark.sql.hive.RelationConversions.apply(HiveStrategies.scala:207)
      at org.apache.spark.sql.hive.RelationConversions.apply(HiveStrategies.scala:191)
      at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:130)
      at scala.collection.IndexedSeqOptimized.foldLeft(IndexedSeqOptimized.scala:60)
      at scala.collection.IndexedSeqOptimized.foldLeft$(IndexedSeqOptimized.scala:68)
      at scala.collection.mutable.ArrayBuffer.foldLeft(ArrayBuffer.scala:49)
      at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:127)
      at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:119)
      at scala.collection.immutable.List.foreach(List.scala:392)
      at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:119)
      at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:168)
      at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:162)
      at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:122)
      at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:98)
      at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
      at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:98)
      at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:146)
      at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
      at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:145)
      at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:66)
      at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
      at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:63)
      at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:63)
      at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:55)
fcy6dtqo

fcy6dtqo1#

在查看了一下之后,通过查看spark.sql.hive.convertMetastoreOrc参数,似乎您走上了正确的道路(例如,参见thisthis类似的错误)。由于您使用parquet作为底层存储,因此您需要spark.sql.hive.convertMetastoreParquet参数。
尝试将spark.sql.hive.convertMetastoreParquet设置为false而不是truetrue是默认值,因此这不会让您走得太远,因为默认行为是导致此错误的原因。

bq9c1y66

bq9c1y662#

此Assert错误表示Hive表edw.result_base的逻辑架构与底层parquet文件的物理架构不匹配。以下是HiveMetastoreCatalog.scala中的相关代码:

:
// The inferred schema may have different field names as the table schema, we should respect
// it, but also respect the exprId in table relation output.
assert(result.output.length == relation.output.length &&
  result.output.zip(relation.output).forall { case (a1, a2) => a1.dataType == a2.dataType })
:

当然,你可以简单地禁用spark.sql.hive.convertMetastoreParquet,但我认为更安全的方法是找出差异所在,否则它们可能会导致不正确的查询结果(特别是如果数据类型不匹配或表def中有额外的列)。
还请注意,在较新的Spark版本中,这个assert被重构为抛出更详细和准确的AnalysisException s。
对于在xmlyter中设置驱动程序类路径,this SO answer似乎可以工作,尽管我认为在您的情况下不需要任何额外的jar,因为您可以很好地从parquet推断模式。

相关问题