我有下面的数据框和下面的模式
db.printSchema()
root
|-- _id: struct (nullable = true)
| |-- oid: string (nullable = true)
|-- id: string (nullable = true)
|-- sparse_rep: struct (nullable = true)
| |-- 1: double (nullable = true)
| |-- 10: double (nullable = true)
| |-- 11: double (nullable = true)
| |-- 12: double (nullable = true)
| |-- 13: double (nullable = true)
| |-- 14: double (nullable = true)
| |-- 15: double (nullable = true)
| |-- 17: double (nullable = true)
| |-- 18: double (nullable = true)
| |-- 2: double (nullable = true)
| |-- 20: double (nullable = true)
| |-- 21: double (nullable = true)
| |-- 22: double (nullable = true)
| |-- 23: double (nullable = true)
| |-- 24: double (nullable = true)
| |-- 25: double (nullable = true)
| |-- 26: double (nullable = true)
| |-- 27: double (nullable = true)
| |-- 3: double (nullable = true)
| |-- 4: double (nullable = true)
| |-- 7: double (nullable = true)
| |-- 9: double (nullable = true)
|-- title: string (nullable = true)
这里所有的ID看起来都很简单,除了稀疏表示。这个稀疏表示对象最初是在spark中作为map[int,double]对象创建的,然后写入mongodb。
但是,当我试图使用数据集将它强制回map[int,double]时
case class blogRow(_id:String, id:Int, sparse_rep:Map[Int,Double],title:String)
val blogRowEncoder = Encoders.product[blogRow]
db.as[blogRow](blogRowEncoder)
我得到以下错误。
Caused by: org.apache.spark.sql.AnalysisException: need a map field but got struct<1:double,10:double,11:double,12:double,13:double,14:double,15:double,17:double,18:double,2:double,20:double,21:double,22:double,23:double,24:double,25:double,26:double,27:double,3:double,4:double,7:double,9:double>;
2条答案
按热度按时间cwtwac6a1#
转换
struct
键入到map
然后输入用例类。中的数据架构
DataFrame
&中的字段case class
应该匹配。检查以下代码。
nc1teljy2#
另一种选择-
输入Dataframe架构
将输入dataframe的模式转换为与case类匹配,然后转换为dataset[row]->dataset[blogrow]
其中案例类别如下-