我有两个这样的数据集
val jsonStr ="""{
"TransactionId": 1,
"TransactionName": "Name",
"Order": 12,
"ReplaceStrings": [
"UNDEFINED","INVALID"
],
"Country" : "China"
}"""
val configurations = spark.read.json(Seq(jsonStr).toDS)
这有我所有的配置和过滤器
My Data
val data = Seq((1,"Mindy","Devaney","mdevaney0@cnbc.com","Female","United States","UTF-8"),(2,"Charmain","Clear","candriolli1@miitbeian.gov.cn","Female","**China**","UTF-8"),(3,"Dilan","**UNDEFINED**","dphilipeaux2@jalbum.net","Male","**China**","Windows-1252")).toDF("id","Fname","LName","mailid","Gender","Country","Codepage" )
现在,我的任务是将带有过滤器的配置数据连接起来,并在过滤器应用于中国国家时使用上述数据检索相应的结果,所有未定义为值的lname将被替换为空字符串。
我试着用一些udf来定义这个函数,但还是停留在如何发送一个 Package 数组的json值上,或者尝试使用seq数据类型
如果有人看了类似的案例或想法请与我分享。
1条答案
按热度按时间tkclm6bt1#
检查以下代码。