Apache Spark 正在从嵌套Map中删除键

n9vozmp4  于 2022-11-25  发布在  Apache
关注(0)|答案(2)|浏览(141)

寻求帮助:

Data: map (nullable = true)
    |-- key: string
    |-- value: map (valueContainsNull = true)
    |    |-- key : string
    |    |-- value : string (valueContainsNull = true)   reffer you

我给你下面的链接Passing a map with struct-type key into a Spark UDF,并创建了一个udf concat字符串:

val myUDF1 = udf((inputMapping:Map[String,Row]) => inputMapping
     .map{case(key,value)=>(key, (value.getString(0),value.getString(1)))}
     .map{ case (key,(i1,i2))=> (key,(i1  + i2)) }
     )

df.withColumn("udfResult", myUDF($"Data")).show()

同样的事情,我想做的,但不是增加整数,我想删除键的值,这是字符串类型。我怎么能存档相同我尝试了这个,但得到错误导致:java.lang.ClassCastException:字符串类不能被强制转换为类org.apache. sparc.sql.Row(字符串类在加载程序“引导”的模块java.base中;行位于加载程序“app”的未命名模块中)
我想从外部Map的valemapType嵌套列中删除特定键:

Data: map (nullable = true)
    |-- key: string
    |--** value: map (valueContainsNull = true)**
    |    |-- key : string
    |    |-- value : string (valueContainsNull = true)   reffer you
kadbb459

kadbb4591#

欢迎使用StackOverflow。也许这个函数可以帮助:

def extractNestedKey(key: String, nestedKey: String) = udf { in: Map[String, Map[String, String]] => in(key) - nestedKey }

考虑一个简单的 Dataframe (我从数据集创建它,因为它非常简单):

spark.createDataset(Seq(Map("key" -> Map("key" -> "value", "key2" -> "value2")))).withColumnRenamed("value", "Data")

它是:

+---------------------------------------+
|Data                                   |
+---------------------------------------+
|{key -> {key -> value, key2 -> value2}}|
+---------------------------------------+

应用UDF:

ds.withColumn("Data2", extractNestedKey("key", "key2")($"Data"))

它创建不带嵌套键的列:

+---------------------------------------+--------------+
|Data                                   |Data2         |
+---------------------------------------+--------------+
|{key -> {key -> value, key2 -> value2}}|{key -> value}|
+---------------------------------------+--------------+
o2gm4chl

o2gm4chl2#

你不需要使用UDF,因为它很昂贵,你可以使用map方法,我这里使用了Dataset,你可以使用data frame

case class Nst(key: String, value: Map[String, Map[String, String]])

val removeList = List("key222")
val ds = Seq( Nst("key1", Map("key11" -> Map("key111" -> "111", "key222" -> "222")))).toDS()

val result = ds.map(nst => nst.copy(value = nst.value.mapValues(nestedMap => nestedMap -- removeList)  ))

result.show(false)
+----+--------------------------+
|key |value                     |
+----+--------------------------+
|key1|{key11 -> {key111 -> 111}}|
+----+--------------------------+

相关问题