我在从s3读取文件时遇到了一个奇怪的问题。这就是我要做的
val previousDay = spark.read
.option("header", "false")
.schema(schema)
.csv(loadPath)
.cache()
这是模式
StructType(
Array(
StructField("location_id", DataTypes.StringType, nullable = true),
StructField("uuid", DataTypes.StringType, nullable = true),
StructField("country_code", DataTypes.StringType, nullable = true),
StructField("shard", DataTypes.StringType, nullable = true),
StructField("has_activity", DataTypes.StringType, nullable = true)
)
)
这就是csv
"location_id","uuid","country_code","shard","has_activity"
"35fb2f0XX","06d0XX","FRA","eu","t"
"9ee98XX","7cd3c7XX","DEU","eu",""
"9d193XX","128abXX","ITA","eu",""
然而,当我在前一天做一个节目,这是我得到的
--------------------+--------------------+------------+
| lid. | uid |country |activity |shard|
+--------------------+--------------------+------------
|location_id | uuid |country_code| shard| eu|
|35fb2f0XX |6d0XX | FRA| eu| eu|
|9ee98XX |7cd3c7XX| DEU| eu| eu|
|9d193XX. |128abXX | ITA| eu| eu|
如图所示,碎片值在两列之间被复制,活动完全消失。
我不知道发生了什么事。我将非常感谢您对此的任何意见
暂无答案!
目前还没有任何答案,快来回答吧!