我使用spark从mongodb读取数据,使用mongodb的spark连接器。但我对嵌套文档有问题。例如,我的mongodb文档如下所示:
/* 1 */
{
"_id": "user001",
"group": 100,
"profile": {
"age": 21,
"fname": "John",
"lname": "Doe",
"email": "johndoe@example.com"
}
}
/* 2 */
{
"_id": "user002",
"group": 400,
"profile": {
"age": 23,
"fname": "Jane",
"lname": "Doe",
"email": "janedoe@example.com"
}
}
还有一节课如下:
case class User(_id: String, group: Option[Long], profile: Map[String, Option[String]])
val spark = SparkSession
.builder
.appName("test-app")
.enableHiveSupport()
.getOrCreate()
import spark.implicits._
val readConfig = ReadConfig(Map("uri" -> xxxx,"collection" -> xxx,"database" -> xxxx))
val userMongoDF: DataFrame = MongoSpark.load[User](spark, readConfig)
val userDF: DataFrame = userMongoDF.filter(userMongoDF("group") > 50)
userDF.printSchema()
userDF.show()
它给出了以下结果:
root
|-- _id: string (nullable = true)
|-- group: long (nullable = true)
|-- profile: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
+-------+------+------------------------------------------------------------------------+
|_id |group |profile | |createdWhen |updatedByUser |updatedWhen |realizedWhen |opp_source |opp_owner |
+-------+------+------------------------------------------------------------------------+
|user001|100 |Map(age -> 21, email -> johndoe@example.com, fname -> John, lname -> Doe|
|user002|400 |Map(age -> 23, email -> janedoe@example.com, fname -> Jane, lname -> Doe|
+-------+------+------------------------------------------------------------------------+
但是当我选择嵌套字段时 profile
,我遇到以下错误消息:
userDF.select(col("_id"),col("profile.email").cast(StringType) as "email").show()
ERROR yarn.ApplicationMaster: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7.0 (TID 437, XXXXX, executor 7): org.bson.BsonInvalidOperationException: Invalid state INITIAL
不像没有选择 profile
效果很好: userDF.select(col("_id")).show()
spark版本2.1.0和mongdb 3.6.19
暂无答案!
目前还没有任何答案,快来回答吧!