scala—如何在sparkDataframe上应用转换来生成元组？

zc0qhyus 于 2021-05-27 发布在 Spark

关注(0)|答案(2)|浏览(344)

我在spark scala中有以下示例Dataframe：

+-------+--------+------------------
|col1   |    col2|             col3|
+-------+--------+------------------
|    200|20200218|batched-202002180|
|    207|20200218|batched-202002190|
+-------+--------+------------------

现在我在spark中为单个列收集值，并执行以下操作：这将产生以下o/p：

scala> val result = newDF.select("col3").collect.map(row => row(0).toString)
result: Array[String] = Array(batched-202002180, batched-202002190)

现在，如何选择其他两列col1和col2并将所有三列收集为一个元组数组？为了简洁起见，我在上面的df中只显示了3个col。我们预计会有3个以上的col。
预期产量：

Array((200, 20200218, "batched-202002180"), (207, 20200218, "batched-202002190"))

scala DataFrame apache-spark apache-spark-sql

来源：https://stackoverflow.com/questions/62057742/how-to-apply-transformations-on-a-spark-dataframe-to-generate-tuples

2条答案

按热度按时间

v64noz0r1#

不转换为 rdd . 请检查下面的代码。

scala> df
.withColumn("col1","col1".cast("long"))
.withColumn("col2","col2".cast("long")).show(false)
+----+--------+-----------------+
|col1|col2    |col3             |
+----+--------+-----------------+
|200 |20200218|batched-202002180|
|207 |20200218|batched-202002190|
+----+--------+-----------------+

scala> df.map(r => (r.getAs[Long](0),r.getAs[Long](1),r.getAs[String](2))).collect()
res229: Array[(Long, Long, String)] = Array((200,20200218,batched-202002180), (207,20200218,batched-202002190))

赞(0）回复(0）举报 2021-05-27

chy5wohz2#

你可以做如下类似的事情

newDF.map(r => (r.getAs[Long](0),r.getAs[Long](1),r.getAs[String](2))).collect()

会给你 Array[(Long, Long, String)] 如果要转换为字符串，可以使用

val result = newDF.select(cols.head, cols.tail: _*).map(r => (r.getLong(0).toString,r.getLong(1).toString,r.getString(2))).collect()

会给你 Array[(String, String, String)]

赞(0）回复(0）举报 2021-05-27

我来回答

scala—如何在sparkDataframe上应用转换来生成元组？

2条答案

相关问题

热门标签

最新问答