spark json模式元数据可以Map到配置单元吗?

aiazj4mn  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(399)

在使用apachespark时,我们可以轻松地生成一个json文件来描述Dataframe结构。此Dataframe结构如下所示:

{
  "type": "struct",
  "fields": [
    {
      "name": "employee_name",
      "type": "string",
      "nullable": true,
      "metadata": {
        "comment": "employee name", 
        "system_name":  "hr system", 
        "business_key": true, 
        "private_info": true
      }
    },
    {
      "name": "employee_job",
      "type": "string",
      "nullable": true,
      "metadata": {
        "comment": "employee job description", 
        "system_name":  "sap", 
        "business_key": false, 
        "private_info": false
      }
    }
  ]
}

当在配置单元中存储此信息或从配置单元获取Dataframe时,spark将把配置单元元数据列中的“comments”Map到元数据中的“comment”属性。但是,如何将json中的dataframe定义Map到配置单元表中呢?是否可以将其他标记存储到business\u key或private\u info flag之类的列中?
谢谢

q7solyqu

q7solyqu1#

是的,可以存储额外的元数据。创建spark兼容的配置单元表并在中添加所需的元数据 TBLPROPERTIES 就像下面一样。
Hive表

CREATE TABLE `employee_details`(
  `employee_name` string COMMENT 'employee name',
  `employee_job` string COMMENT 'employee job description') 
STORED AS ORC
TBLPROPERTIES (
  'spark.sql.sources.provider'='orc',
  'spark.sql.sources.schema.numParts'='1',
  'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[{\"name\":\"employee_name\",\"type\":\"string\",\"nullable\":true,\"metadata\":{\"comment\":\"employee name\",\"business_key\":true,\"system_name\":\"hr system\",\"private_info\":true}},{\"name\":\"employee_job\",\"type\":\"string\",\"nullable\":true,\"metadata\":{\"comment\":\"employee job description\",\"business_key\":false,\"system_name\":\"sap\",\"private_info\":false}}]}'
)

从spark访问表

scala> val df = spark.table("hivedb.employee_details")
adf: org.apache.spark.sql.DataFrame = [employee_name: string, employee_job: string]

scala> df.schema.prettyJson
res12: String =
{
  "type" : "struct",
  "fields" : [ {
    "name" : "employee_name",
    "type" : "string",
    "nullable" : true,
    "metadata" : {
      "comment" : "employee name",
      "business_key" : true,
      "system_name" : "hr system",
      "private_info" : true
    }
  }, {
    "name" : "employee_job",
    "type" : "string",
    "nullable" : true,
    "metadata" : {
      "comment" : "employee job description",
      "business_key" : false,
      "system_name" : "sap",
      "private_info" : false
    }
  } ]
}

相关问题