如何用json模式创建ddl模式(hive)

hi3rlvi2  于 2021-06-25  发布在  Hive
关注(0)|答案(1)|浏览(517)

我用pyspark的printschema()函数创建了json schema create,我想用这个json创建一个hiveddl。
我在谷歌上搜索了一下,却找不到解决办法。有人有主意吗?
谢谢,
百合花

hmae6n7t

hmae6n7t1#

这是一个Spark python 函数的示例,您可以使用它来创建ddl dataframe 配置单元表创建的架构。请相应调整。

def sparkDataFrameCreateTable(df, T = ''):
    cols = df.dtypes
    ddl = []
    ddl.append("CREATE TABLE IF NOT EXISTS {} (".format(T))
    kv =  df.dtypes
    num = len(df.dtypes)
    count = 1
    for i in kv:
        print(count, num, i)
        if count == num:
            total = str(i[0]) + str(" ") + str(i[1])
        else:
            total = str(i[0]) + str(" ") + str(i[1]) + str(", ")
        ddl.append(total)
        count = count + 1
    ddl.append(") STORED AS PARQUET")
    schema_map = ''.join(ddl)
    print(schema_map)
    exec_sql = spark.sql(schema_map)
    return exec_sql

df = spark.range(10)
spark.sql("create database if not exists junk")
spark.sql("show databases").show()
sparkDataFrameCreateTable(df, "junk.test")
spark.sql("use junk")
spark.sql("show tables").show()

+------------+
|databaseName|
+------------+
|     default|
|        junk|
+------------+

1 1 ('id', 'bigint')
CREATE TABLE IF NOT EXISTS junk.test (id bigint) STORED AS PARQUET
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
|    junk|     test|      false|
+--------+---------+-----------+

相关问题