我在spark(scala)中使用udf时遇到问题。这是一个示例代码:
import org.apache.spark.sql.{SparkSession, DataFrame}
import org.apache.spark.sql.functions.{col, udf}
val spark = SparkSession.builder.appName("test")
.master("local[*]")
.getOrCreate()
import spark.implicits._
def func(a: Array[Int]): Array[Int] = a
val funcUDF = udf((a: Array[Int]) => func(a))
var data = Seq(Array(1, 2, 3), Array(3, 4, 5), Array(6, 2, 4)).toDF("items")
data = data.withColumn("a", funcUDF(col("items")))
data.show()
我得到的错误与classcastexception有关,它表示无法从 scala.collection.mutable.WrappedArray$ofRef
至 org.apache.spark.sql.catalyst.expressions.ScalaUDF.$anonfun$f$2
. 我在下面添加了一部分堆栈。如果可以的话,我正在使用https://community.cloud.databricks.com/.
原因:java.lang.classcastexception:scala.collection.mutable.wrappedarray$ofref不能强制转换为[i at org.apache.spark.sql.catalyst.expressions.scalaudf.$anonfun$f$2(scalaudf)。scala:155)在org.apache.spark.sql.catalyst.expressions.scalaudf.eval(scalaudf。scala:1125)在org.apache.spark.sql.catalyst.expressions.alias.eval(namedexpressions。scala:156)位于org.apache.spark.sql.catalyst.expressions.interpretatedmutableprojection.apply(interpretatedmutableprojection)。scala:83)在org.apache.spark.sql.catalyst.optimizer.converttolocalrelation$$anonfun$apply$15.$anonfun$applyorelse$70(优化器)。scala:1557)在scala.collection.traversablelike.$anonfun$map$1(traversablelike。scala:238)在scala.collection.immutable.list.foreach(list。scala:392)在scala.collection.traversablelike.map(traversablelike。scala:238)在scala.collection.traversablelike.map$(traversablelike。scala:231)在scala.collection.immutable.list.map(list。scala:298)在org.apache.spark.sql.catalyst.optimizer.converttolocalrelation$$anonfun$apply$15.applyorelse(优化器。scala:1557)在org.apache.spark.sql.catalyst.optimizer.converttolocalrelation$$anonfun$apply$15.applyorelse(优化器。scala:1552)在org.apache.spark.sql.catalyst.trees.treenode.$anonfun$transformdown$1(treenode。scala:322)在org.apache.spark.sql.catalyst.trees.currentorigin$.withorigin(treenode。scala:80)在org.apache.spark.sql.catalyst.trees.treenode.transformdown(treenode。scala:322)在org.apache.spark.sql.catalyst.plans.logical.logicalplan.org$apache$spark$sql$catalyst$plans$logical$analysishelper$$super$transformdown(logicalplan)上。scala:29)在org.apache.spark.sql.catalyst.plans.logical.analysishelper.transformdown(analysishelper。scala:153)在org.apache.spark.sql.catalyst.plans.logical.analysishelper.transformdown$(analysishelper。scala:151)在org.apache.spark.sql.catalyst.plans.logical.logicalplan.transformdown(logicalplan。scala:29)在org.apache.spark.sql.catalyst.plans.logical.logicalplan.transformdown(logicalplan。scala:29)在org.apache.spark.sql.catalyst.trees.treenode.$anonfun$transformdown$3(treenode。scala:327)在org.apache.spark.sql.catalyst.trees.treenode.$anonfun$mapchildren$1(treenode。scala:412)在org.apache.spark.sql.catalyst.trees.treenode.mapproductiterator(treenode。scala:250)在org.apache.spark.sql.catalyst.trees.treenode.mapchildren(treenode。scala:410)在org.apache.spark.sql.catalyst.trees.treenode.mapchildren(treenode。scala:363)在org.apache.spark.sql.catalyst.trees.treenode.transformdown(treenode。scala:327)在org.apache.spark.sql.catalyst.plans.logical.logicalplan.org$apache$spark$sql$catalyst$plans$logical$analysishelper$$super$transformdown(logicalplan)。scala:29)在org.apache.spark.sql.catalyst.plans.logical.analysishelper.transformdown(analysishelper。scala:153)在org.apache.spark.sql.catalyst.plans.logical.analysishelper.transformdown$(analysishelper。scala:151)在org.apache.spark.sql.catalyst.plans.logical.logicalplan.transformdown(logicalplan。scala:29)在org.apache.spark.sql.catalyst.plans.logical.logicalplan.transformdown(logicalplan。scala:29)在org.apache.spark.sql.catalyst.trees.treenode.$anonfun$transformdown$3(treenode。scala:327)在org.apache.spark.sql.catalyst.trees.treenode.$anonfun$mapchildren$1(treenode。scala:412)在org.apache.spark.sql.catalyst.trees.treenode.mapproductiterator(treenode。scala:250)在org.apache.spark.sql.catalyst.trees.treenode.mapchildren(treenode。scala:410) 在org.apache.spark.sql.catalyst.trees.treenode.mapchildren(treenode。scala:363)在org.apache.spark.sql.catalyst.trees.treenode.transformdown(treenode。scala:327)在org.apache.spark.sql.catalyst.plans.logical.logicalplan.org$apache$spark$sql$catalyst$plans$logical$analysishelper$$super$transformdown(logicalplan)上。scala:29)在org.apache.spark.sql.catalyst.plans.logical.analysishelper.transformdown(analysishelper。scala:153)在org.apache.spark.sql.catalyst.plans.logical.analysishelper.transformdown$(analysishelper。scala:151)在org.apache.spark.sql.catalyst.plans.logical.logicalplan.transformdown(logicalplan。scala:29)在org.apache.spark.sql.catalyst.plans.logical.logicalplan.transformdown(logicalplan。scala:29)在org.apache.spark.sql.catalyst.trees.treenode.$anonfun$transformdown$3(treenode。scala:327)在org.apache.spark.sql.catalyst.trees.treenode.$anonfun$mapchildren$1(treenode。scala:412)在org.apache.spark.sql.catalyst.trees.treenode.mapproductiterator(treenode。scala:250)在org.apache.spark.sql.catalyst.trees.treenode.mapchildren(treenode。scala:410)在org.apache.spark.sql.catalyst.trees.treenode.mapchildren(treenode。scala:363)在org.apache.spark.sql.catalyst.trees.treenode.transformdown(treenode。scala:327)在org.apache.spark.sql.catalyst.plans.logical.logicalplan.org$apache$spark$sql$catalyst$plans$logical$analysishelper$$super$transformdown(logicalplan)。scala:29)在org.apache.spark.sql.catalyst.plans.logical.analysishelper.transformdown(analysishelper。scala:153)在org.apache.spark.sql.catalyst.plans.logical.analysishelper.transformdown$(analysishelper。scala:151)在org.apache.spark.sql.catalyst.plans.logical.logicalplan.transformdown(logicalplan。scala:29)在org.apache.spark.sql.catalyst.plans.logical.logicalplan.transformdown(logicalplan。scala:29)在org.apache.spark.sql.catalyst.trees.treenode.transform(treenode。scala:311)在org.apache.spark.sql.catalyst.optimizer.converttolocalrelation$.apply(优化器。scala:1552)在org.apache.spark.sql.catalyst.optimizer.converttolocalrelation$.apply(优化器。scala:1551)在org.apache.spark.sql.catalyst.rules.ruleexecutor.$anonfun$execute$2(ruleexecutor。scala:152)在scala.collection.indexedseqoptimized.foldleft(indexedseqoptimized。scala:60)在scala.collection.indexedseqoptimized.foldleft$(indexedseqoptimized。scala:68)在scala.collection.mutable.wrappedarray.foldleft(wrappedarray。scala:38)在org.apache.spark.sql.catalyst.rules.ruleexecutor.$anonfun$execute$1(ruleexecutor。scala:149)在org.apache.spark.sql.catalyst.rules.ruleexecutor.$anonfun$execute$1$adapted(ruleexecutor。scala:141)在scala.collection.immutable.list.foreach(list。scala:392)位于org.apache.spark.sql.catalyst.rules.ruleexecutor.execute(ruleexecutor。scala:141)在org.apache.spark.sql.catalyst.rules.ruleexecutor.$anonfun$executeandtrack$1(ruleexecutor。scala:119)位于org.apache.spark.sql.catalyst.queryplanningtracker$.withtracker(queryplanningtracker)。scala:88)在org.apache.spark.sql.catalyst.rules.ruleexecutor.executeandtrack(ruleexecutor。scala:119)在org.apache.spark.sql.execution.queryexecution.$anonfun$optimizedplan$1(查询执行)。scala:107)在org.apache.spark.sql.catalyst.queryplanningtracker.measurephase(queryplanningtracker。scala:111)在org.apache.spark.sql.execution.queryexecution.$anonfun$executephase$1(queryexecution。scala:171)在org.apache.spark.sql.sparksession.withactive(sparksession。scala:836)在org.apache.spark.sql.execution.queryexecution.executeBase(queryexecution。scala:171)在org.apache.spark.sql.execution.queryexecution.optimizedplan$lzycompute(queryexecution。scala:104)在org.apache.spark.sql.execution.queryexecution.optimizedplan(queryexecution。scala:104)在org.apache.spark.sql.execution.queryexecution.$anonfun$writeplans$4(查询执行)。scala:246)在org.apache.spark.sql.catalyst.plans.queryplan$.append(queryplan。scala:466)在org.apache.spark.sql.execution.queryexecution.org$apache$spark$sql$execution$queryexecution$$writeplan(queryexecution)。scala:246)在org.apache.spark.sql.execution.queryexecution.tostring(queryexecution。scala:256)在org.apache.spark.sql.execution.sqlexecution$.$anonfun$withcustomexecutionenv$5(sqlexecution)。scala:109)位于org.apache.spark.sql.execution.sqlexecution$.withsqlconfpropagated(sqlexecution)。scala:249)在org.apache.spark.sql.execution.sqlexecution$.$anonfun$带customexecutionenv$1(sqlexecution)。scala:101)位于org.apache.spark.sql.sparksession.withactive(sparksession。scala:836)在org.apache.spark.sql.execution.sqlexecution$.withcustomexecutionenv(sqlexecution)。scala:77)在org.apache.spark.sql.execution.sqlexecution$.withnewexecutionid(sqlexecution)。scala:199)在org.apache.spark.sql.dataset.withaction(dataset。scala:3700)在org.apache.spark.sql.dataset.head(dataset。scala:2711)在org.apache.spark.sql.dataset.take(dataset。scala:2918)在org.apache.spark.sql.dataset.getrows(dataset。scala:305)在org.apache.spark.sql.dataset.showstring(数据集。scala:342)在org.apache.spark.sql.dataset.show(dataset。scala:838)在org.apache.spark.sql.dataset.show(dataset。scala:797)在org.apache.spark.sql.dataset.show(dataset。scala:806)在lineedcf33d032244134ad784ac9de826d3b265.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw(命令-1114467142343660:14)lineedcf33d032244134ad784ac9de826d3b265.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw。(命令-1114467142343660:164)lineedcf33d032244134ad784ac9de826d3b265.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(命令-1114467142343660:166)lineedcf33d032244134ad784ac9de826d3b265.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw(命令-1114467142343660:168)该行CF33D0322424413434344年的该行CF33D0322441414134AD784AC9DE826D3B265年,读$$读$$读$$读读$$iw$$iw$$$32414141414141343434344141414134343434343434343434343434343434AD该行CF33EDEDEDCF33EDCF334年的CF334年,该3434343434343434343434ADEDEDEDEDCF33EDCF33EDCF334年,该该行CF3330304141414141414141414141414年,该4141414141414141414141414141414年,该AD7834343434AD783434343434343434343434343434该该该该该该该该行CF3328282828282828IW$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw。(命令-1114467142343660:172)在该行CF33D032242441343434AD784AC33334141414141414141414 AD784AC9DE826D3B265.读$$读$$读$$读$$读$$读$$读$$iw$$iw$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw。(命令-1114467142343660:176)在
暂无答案!
目前还没有任何答案,快来回答吧!