我有一个示例数据框:
val A = """[[15,["Printing Calculators"]],[13811,["Office Products"]]]"""
val B = """[[30888,["Paper & Printable Media"]],[223845,["Office Products"]]]"""
val C = """[[64,["Office Calculator Accessories"]]]"""
val df = List(A,B,C).toDF("bestseller_ranks")
我想创建一个列,如下所示:
case class BestSellerRank(
Ranking: Integer,
Category: String
)
val A2 = List(new BestSellerRank(15,"Printing Calculators"),new BestSellerRank(13811,"Office Products"))
val B2 = List(new BestSellerRank(30888,"Paper & Printable Media"),new BestSellerRank(223845,"Office Products"))
val C2 =List(new BestSellerRank(64,"Office Calculator Accessories"))
val df2 = List(A2,B2,C2).toDF("bestseller_ranks_transformed")
我曾尝试创建如下自定义项:
val BRUDF: UserDefinedFunction =
udf(
(bestseller_ranks: String) => {
bestseller_ranks.split(",").fold(List.empty[BestSellerRank])(v => new BestSellerRank(v._1, v._2))
}
)
但这似乎完全是垃圾,我被卡住了。谢谢你的帮助!
2条答案
按热度按时间j2qf4p5b1#
我试着在没有自定义项的情况下实现这个。也许这是有帮助的
加载提供的测试数据
转换字符串->数组[struct]
jgwigjjp2#
以下是我的解决方案: