spark读取数据集错误且怪异

zed5wv10  于 2021-05-24  发布在  Spark
关注(0)|答案(0)|浏览(197)

我在从s3读取文件时遇到了一个奇怪的问题。这就是我要做的

val previousDay = spark.read
      .option("header", "false")
      .schema(schema)
      .csv(loadPath)
      .cache()

这是模式

StructType(
    Array(
      StructField("location_id", DataTypes.StringType, nullable = true),
      StructField("uuid", DataTypes.StringType, nullable = true),
      StructField("country_code", DataTypes.StringType, nullable = true),
      StructField("shard", DataTypes.StringType, nullable = true),
      StructField("has_activity", DataTypes.StringType, nullable = true)
    )
  )

这就是csv

"location_id","uuid","country_code","shard","has_activity"
"35fb2f0XX","06d0XX","FRA","eu","t"
"9ee98XX","7cd3c7XX","DEU","eu",""
"9d193XX","128abXX","ITA","eu",""

然而,当我在前一天做一个节目,这是我得到的

--------------------+--------------------+------------+
| lid.       |    uid |country     |activity    |shard|
+--------------------+--------------------+------------
|location_id |   uuid |country_code|       shard|   eu|
|35fb2f0XX   |6d0XX   |         FRA|          eu|   eu|
|9ee98XX     |7cd3c7XX|         DEU|          eu|   eu|
|9d193XX.    |128abXX |         ITA|          eu|   eu|

如图所示,碎片值在两列之间被复制,活动完全消失。
我不知道发生了什么事。我将非常感谢您对此的任何意见

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题