我有两个Dataframe
val df1 = Seq(("1","2","3"),("4","5","6")).toDF("A","B","C")
df1.show
+---+---+---+
| A| B| C|
+---+---+---+
| 1| 2| 3|
| 1| 2| 3|
+---+---+---+
和
val df2 = Seq(("11","22","33"),("44","55","66")).toDF("D","E","F")
df2.show
+---+---+---+
| D| E| F|
+---+---+---+
| 11| 22| 33|
| 44| 55| 66|
+---+---+---+
我需要把上面的结合起来
val df3 = Seq(("1","2","3","","",""),("4","5","6","","",""),("","","","11","22","33"),("","","","44","55","66"))
.toDF("A","B","C","D","E","F")
df3.show
+---+---+---+---+---+---+
| A| B| C| D| E| F|
+---+---+---+---+---+---+
| 1| 2| 3| | | |
| 4| 5| 6| | | |
| | | | 11| 22| 33|
| | | | 44| 55| 66|
+---+---+---+---+---+---+
现在,我正在为所有Dataframe手动创建缺少的列,以获得一个公共结构,然后使用 union
. 此代码特定于Dataframe,不可伸缩
寻找一个解决方案,将与 x
Dataframe y
每列
2条答案
按热度按时间pftdvrlh1#
您可以手动在两个Dataframe中创建缺少的列,然后合并它们:
zbq4xfa02#
一种更简单的方法是创建一个完整的外部联接,并将联接表达式/条件设置为false:
如果要将空值实际设置为空字符串,只需添加:
总而言之
df1.join(df2, lit(false), "full").na.fill("")
应该会成功的。