我有一个如下所示的Dataframe。我要将最后一列trandata从string类型转换为maptype。输出应该与我在第二个表中所示的类似。
我已经编写了udf,但它需要字符串并转换为maptype,我很难用sql.row作为输入获得类似的输出:(
def stringToMap(value: String): Map[String, String] = {
var valMap = collection.mutable.Map[String, String]()
val values = value.split(",")
for (i <- values) {
valMap = valMap + (i.split("=")(0) -> i.split("=")(1))
}
return valMap
}
+--------------+--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|MESSAGEID |CATEGORY|TRANDATA |
+--------------+--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|03010 |A |threadID=123sada,ProcType=InfraLogging,TxnID=4mjx8wfogf
|03011 |A |threadID=xmjxe2j0jz,ProcType=InfraLogging,TxnID=4mjxe2j0tf
|09941 |D |compTxnID=xmawdew0tf,to=ABCD,threadID=4mjxe2j0jz,ProcType=InfraLogging
|00994 |D |compTxnID=xmjxe2j0tf,to=XYZA,threadID=34jxasde0jz,ProcType=InfraLogging
+--------------+--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
表2:导出输出-第三列为maptype
+--------------+--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|MESSAGEID |CATEGORY|TRANDATA |
+--------------+--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|03010 |A |Map(threadID -> 123sada,ProcType -> InfraLogging,TxnID -> 4mjx8wfogf)
1条答案
按热度按时间vaqhlq811#
对于spark 2.4+,您可以将字符串拆分为键值对,然后使用transform将键和值分隔为两个数组列,然后使用map\ from\ arrays创建最终的Map。
输出: