map函数中的数组操作:spark 1.6

bvjxkvbb  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(540)

我有一个列,它是struct类型的 Package 数组,包含一个整数和一个双精度值。
架构如下所示:

|-- pricing_data: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- _1: integer (nullable = false)
 |    |    |-- _2: double (nullable = false)

所以,每当这个列值为0,0.0时,我就需要将它改为空数组0,0.0 -> .
我怎么能用Map做这个?或者使用Dataframe?

flmtquvp

flmtquvp1#

试试这个- spark>=2.4 ```
val df = Seq(Seq((0, 0.0)), Seq((1, 2.2))).toDF("pricing_data")
df.show(false)
df.printSchema()

/**
  * +------------+
  * |pricing_data|
  * +------------+
  * |[[0, 0.0]]  |
  * |[[1, 2.2]]  |
  * +------------+
  *
  * root
  * |-- pricing_data: array (nullable = true)
  * |    |-- element: struct (containsNull = true)
  * |    |    |-- _1: integer (nullable = false)
  * |    |    |-- _2: double (nullable = false)
  */

df.withColumn("pricing_data", expr(
"TRANSFORM(pricing_data, x -> if(x._1=0 and x._2=0.0, named_struct('_1', null, '_2', null), x))"
))
  .show(false)

/**
  * +------------+
  * |pricing_data|
  * +------------+
  * |[[,]]       |
  * |[[1, 2.2]]  |
  * +------------+
  */

`spark<2.4`
// spark<2.4
val dataType = df.schema("pricing_data").dataType
val replace = udf((arrayOfStruct: mutable.WrappedArray[Row]) => {
arrayOfStruct.map(row => {
val map = row.getValuesMap(row.schema.map(_.name))
if(map("_1")==0 && map("_2") == 0.0) {
Row.fromTuple((null, null))
} else row
})
}, dataType)

df.withColumn("pricing_data", replace($"pricing_data"))
    .show(false)

/**
  * +------------+
  * |pricing_data|
  * +------------+
  * |[[,]]       |
  * |[[1, 2.2]]  |
  * +------------+
  */

相关问题