如何打印Spark字符串?

m2xkgtsf  于 2023-10-23  发布在  Apache
关注(0)|答案(1)|浏览(160)

我使用.toDDL得到的模式字符串很简洁,但是对于复杂的模式来说非常难以阅读。如何格式化它,使它看起来更容易与所有的缩进和换行符?

06odsfpq

06odsfpq1#

我相信没有直接的功能来格式化文件。
我使用下面的代码来格式化/解析XML。它将向df.schemaStructType的对象添加printDDL函数

scala> df.printSchema
root
 |-- author: struct (nullable = true)
 |    |-- firstname: string (nullable = true)
 |    |-- lastname: string (nullable = true)
 |-- category: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- editor: struct (nullable = true)
 |    |-- firstname: string (nullable = true)
 |    |-- lastname: string (nullable = true)
 |-- isbn: string (nullable = true)
 |-- title: string (nullable = true)
scala> df.schema.toDDL
res86: String = author STRUCT<firstname: STRING, lastname: STRING>,category ARRAY<STRING>,editor STRUCT<firstname: STRING, lastname: STRING>,isbn STRING,title STRING
scala> :paste
// Entering paste mode (ctrl-D to finish)

implicit class DDL(val schema: org.apache.spark.sql.types.StructType) {
    def printDDL: Unit = {
        val tableName = "_source"
        spark.sql(s"DROP TABLE IF EXISTS ${tableName}")
        spark.sql(s"CREATE TABLE IF NOT EXISTS ${tableName}(${schema.toDDL}) USING orc")
        println(spark.sql(s"SHOW CREATE TABLE ${tableName}")
        .as[String]
        .head
        .split("\n")
        .filterNot(l => l.contains("CREATE") || l.contains("USING")).mkString("\n ", "\n ", "")
        .dropRight(1))
    }
}
scala> df.schema.printDDL

   author STRUCT<firstname: STRING, lastname: STRING>,
   category ARRAY<STRING>,
   editor STRUCT<firstname: STRING, lastname: STRING>,
   isbn STRING,
   title STRING

scala>

相关问题