如何在spark中创建结构数组

5vf7fwbs  于 2021-05-19  发布在  Spark
关注(0)|答案(1)|浏览(492)

我试图在sparkDataframe中创建一个struct(col,col)数组,但出现了错误。使用样本数据得出相同的错误。
Dataframe

val df = Seq((1, "One", "uno", true), (2, "Two", "Dos", true), (3, "Three", "Tres", false)).toDF("number", "English", "Spanish", "include_spanish")

scala> df.show
+------+-------+-------+---------------+
|number|English|Spanish|include_spanish|
+------+-------+-------+---------------+
|     1|    One|    uno|           true|
|     2|    Two|    Dos|           true|
|     3|  Three|   Tres|          false|
+------+-------+-------+---------------+

现在,我尝试用现有列创建struct,然后用它创建一个数组。

val df1 = df.withColumn("numberToEnglish", struct(col("number"), col("English"))).withColumn("numberToSpanish", struct("number", "Spanish")).withColumn("numberToLanguage", when(col("include_spanish") === true, array("numberToEnglish", "numberToSpanish")).otherwise(array("numberToEnglish"))

低于误差,

org.apache.spark.sql.AnalysisException: cannot resolve 'array(`numberToEnglish`, `numberToSpanish`)' due to data type mismatch: input to function array should all be the same type, but it's [struct<number:int,English:string>, struct<number:int,Spanish:string>];;
'Project [number#200, English#201, Spanish#202, include_spanish#203, numberToEnglish#253, numberToSpanish#259, CASE WHEN (include_spanish#203 = true) THEN array(numberToEnglish#253, numberToSpanish#259) ELSE array(numberToEnglish#253) END AS numberToLanguage#266]

实现此功能的最佳方法是什么?

2wnc66cl

2wnc66cl1#

为了 array 要查看的方法 struct($"number", $"English") 以及 struct($"number", $"Spanish") 作为相同的数据类型,您需要命名struct元素,如下所示:

val df = Seq(
    (1, "One", "uno", true), (2, "Two", "Dos", true), (3, "Three", "Tres", false)
  ).toDF("number", "English", "Spanish", "include_spanish")

df.
  withColumn("numberToEnglish", struct($"number".as("num"), $"English".as("lang"))).
  withColumn("numberToSpanish", struct($"number".as("num"), $"Spanish".as("lang"))).
  withColumn("numberToLanguage",
    when($"include_spanish", array($"numberToEnglish", $"numberToSpanish")).
    otherwise(array($"numberToEnglish"))
  ).
  show
// +------+-------+-------+---------------+---------------+---------------+--------------------+
// |number|English|Spanish|include_spanish|numberToEnglish|numberToSpanish|    numberToLanguage|
// +------+-------+-------+---------------+---------------+---------------+--------------------+
// |     1|    One|    uno|           true|       [1, One]|       [1, uno]|[[1, One], [1, uno]]|
// |     2|    Two|    Dos|           true|       [2, Two]|       [2, Dos]|[[2, Two], [2, Dos]]|
// |     3|  Three|   Tres|          false|     [3, Three]|      [3, Tres]|        [[3, Three]]|
// +------+-------+-------+---------------+---------------+---------------+--------------------+

相关问题