我正在尝试为我的json文件应用nullable=false。它总是将默认值显示为nullable=true。写了我自己的模式。
val carsSchema = StructType(Array(
StructField("Name", StringType),
StructField("Miles_per_Gallon", DoubleType,nullable = false),
StructField("Cylinders", LongType),
StructField("Displacement", DoubleType),
StructField("Horsepower", LongType),
StructField("Weight_in_lbs", LongType),
StructField("Acceleration", DoubleType),
StructField("Year", StringType),
StructField("Origin", StringType)))
df.show()
root
|-- Name: string (nullable = true)
|-- Miles_per_Gallon: double (nullable = true)
|-- Cylinders: long (nullable = true)
|-- Displacement: double (nullable = true)
|-- Horsepower: long (nullable = true)
|-- Weight_in_lbs: long (nullable = true)
|-- Acceleration: double (nullable = true)
|-- Year: string (nullable = true)
|-- Origin: string (nullable = true)
经过一些研究,转换成rdd,然后应用到df使用下面的代码。
val jsonRDD = spark.sparkContext.textFile(carsDataWithErrorjsonfile)
val carDF = spark.read
//.format("json")
//.option("inferSchema", true)
.schema(carsSchema)
.option("mode","permisive") //failFast,permisive,dropMalformed,
.json(jsonRDD)
它正在按预期工作。但ide显示,作为rdd传递给json的方法已被弃用。可以选择将nullable设置为false。
样本数据集
{"Name":"chevrolet chevelle malibu", "Miles_per_Gallon":18, "Cylinders":8, "Displacement":307, "Horsepower":130, "Weight_in_lbs":3504, "Acceleration":12, "Year":"1970-01-01", "Origin":"USA"}
暂无答案!
目前还没有任何答案,快来回答吧!