spark json-apply schema with nullable=false

tv6aics1  于 2021-05-27  发布在  Spark
关注(0)|答案(0)|浏览(243)

我正在尝试为我的json文件应用nullable=false。它总是将默认值显示为nullable=true。写了我自己的模式。

val carsSchema = StructType(Array(
    StructField("Name", StringType),
    StructField("Miles_per_Gallon", DoubleType,nullable = false),
    StructField("Cylinders", LongType),
    StructField("Displacement", DoubleType),
    StructField("Horsepower", LongType),
    StructField("Weight_in_lbs", LongType),
    StructField("Acceleration", DoubleType),
    StructField("Year", StringType),
    StructField("Origin", StringType)))

df.show()

root
 |-- Name: string (nullable = true)
 |-- Miles_per_Gallon: double (nullable = true)
 |-- Cylinders: long (nullable = true)
 |-- Displacement: double (nullable = true)
 |-- Horsepower: long (nullable = true)
 |-- Weight_in_lbs: long (nullable = true)
 |-- Acceleration: double (nullable = true)
 |-- Year: string (nullable = true)
 |-- Origin: string (nullable = true)

经过一些研究,转换成rdd,然后应用到df使用下面的代码。

val jsonRDD = spark.sparkContext.textFile(carsDataWithErrorjsonfile)
  val carDF = spark.read
            //.format("json")
          //.option("inferSchema", true)
          .schema(carsSchema)
          .option("mode","permisive") //failFast,permisive,dropMalformed,
          .json(jsonRDD)


它正在按预期工作。但ide显示,作为rdd传递给json的方法已被弃用。可以选择将nullable设置为false。
样本数据集

{"Name":"chevrolet chevelle malibu", "Miles_per_Gallon":18, "Cylinders":8, "Displacement":307, "Horsepower":130, "Weight_in_lbs":3504, "Acceleration":12, "Year":"1970-01-01", "Origin":"USA"}

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题