scala spark如何求两列的和

b1zrtrql  于 2021-05-26  发布在  Spark
关注(0)|答案(1)|浏览(1375)

我有一张这样的table:

+------+-----+
|  ColA| ColB|
+------+-----+
|    5 |    1| 
|    8 |    2| 
+------+-----+

我需要添加一个摘要列,将行值广告在一起,如下所示:

+------+-----+-----+
|  ColA| ColB| SUM |
+------+-----+-----+
|    5 |    1|    6|
|    8 |    2|   10|
+------+-----+-----+

以下是我尝试的方法:

var foo = df.withColumn("SUM", sum(df("ColA"), df("ColB")))

但我现在 error: overloaded method value sum with alternatives:

pvcm50d1

pvcm50d11#

一种方法如下

import spark.implicits._
import org.apache.spark.sql.function._

val data = List((1,5), (4,3), (6,2))
val df = spark.sparkContext.parallelize(data).toDF("ColA", "ColB")

var foo = df.select("ColA", "ColB")
    .withColumn("SUM", col("ColA") + col("ColB"))
foo.show()
/*
+----+----+---+
|ColA|ColB|SUM|
+----+----+---+
|   1|   5|  6|
|   4|   3|  7|
|   6|   2|  8|
+----+----+---+

* /

// or

var foo2 = df.selectExpr(
    "ColA",
    "ColB",
    "ColA + ColB as SUM"
  )
foo2.show()
/*
+----+----+---+
|ColA|ColB|SUM|
+----+----+---+
|   1|   5|  6|
|   4|   3|  7|
|   6|   2|  8|
+----+----+---+

* /

相关问题