无法将表保存到hive metastore，hdp3.0

g6baxovj 于 2021-06-27 发布在 Hive

关注(0)|答案(2)|浏览(479)

我不能再使用metastore将表保存到配置单元数据库。我看到spark里的table spark.sql 但我在配置单元数据库中看不到相同的表。我试过这个，但是它不能把table放在Hive里。如何配置配置单元元存储？spark版本是2.3.1。
如果你想要更多的细节，请评论。

%spark
import org.apache.spark.sql.SparkSession
val spark = (SparkSession
        .builder
        .appName("interfacing spark sql to hive metastore without configuration file")
        .config("hive.metastore.uris", "thrift://xxxxxx.xxx:9083") // replace with your hivemetastore service's thrift url
        .enableHiveSupport() // don't forget to enable hive support
        .getOrCreate())

spark.conf.get("spark.sql.warehouse.dir")// Output: res2: String = /apps/spark/warehouse
spark.conf.get("hive.metastore.warehouse.dir")// NotSuchElement Exception
spark.conf.get("spark.hadoop.hive.metastore.uris")// NotSuchElement Exception

var df = (spark
        .read
        .format("parquet")
        .load(dataPath)

df.createOrReplaceTempView("my_temp_table");
spark.sql("drop table if exists my_table");
spark.sql("create table my_table using hive as select * from my_temp_table");
spark.sql("show tables").show(false)// I see my_table in default database

更新后@catpaws答案：hdp3.0及更高版本，Hive和Spark使用独立的目录
将表格保存到spark目录：

df.createOrReplaceTempView("my_temp_table");
spark.sql("create table my_table as select * from my_temp_table");

与
将表保存到配置单元目录：

val hive = com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.session(spark).build()

hive.createTable("newTable")
  .ifNotExists()
  .column("ws_sold_time_sk", "bigint")
  ...// x 200 columns
  .column("ws_ship_date_sk", "bigint")
  .create()

df.write.format(HIVE_WAREHOUSE_CONNECTOR)
  .option("table", "newTable")
  .save()

正如您看到的那样，对于具有数百列的Dataframe，配置单元仓库连接器是非常不切实际的。有没有办法将大Dataframe保存到配置单元？

Hive apache-spark hive-metastore apache-spark-2.3

来源：https://stackoverflow.com/questions/53323964/cant-save-table-to-hive-metastore-hdp-3-0

2条答案

按热度按时间

biswetbf1#

正如@catpaws所说，spark和hive使用独立的目录。要使用hive warehouse connector保存具有多列的Dataframe，可以使用my函数：

save_table_hwc(df1, "default", "table_test1")

def save_table_hwc(df: DataFrame, database: String, tableName: String) : Unit = {
    hive.setDatabase(database)
    hive.dropTable(tableName, true, false)
    hive.createTable(tableName)
    var table_builder = hive.createTable(tableName)
    for( i <- 0 to df.schema.length-1){
        var name = df.schema.toList(i).name.replaceAll("[^\\p{L}\\p{Nd}]+", "")
        var data_type = df.schema.toList(i).dataType.sql
        table_builder = table_builder.column(name, data_type)
    }
    table_builder.create()
    df.write.format(HIVE_WAREHOUSE_CONNECTOR).option("table", tableName).save()
}

赞(0）回复(0）举报 2021-06-27

rlcwz9us2#

来自hortonworks文档：在hdp3.0及更高版本中，spark和hive使用独立的目录来访问相同或不同平台上的sparksql或hive表。spark创建的表位于spark目录中。配置单元创建的表驻留在配置单元目录中。数据库属于目录名称空间，类似于表属于数据库名称空间的方式。尽管这些表是独立的，但它们可以互操作，并且您可以在配置单元目录中看到spark表，但只有在使用配置单元仓库连接器时才能看到。
使用hwcapi的写操作将Dataframe写入配置单元。
更新：您现在可以（通过使用hdp 3.1）创建一个Dataframe，如果表示该Dataframe的配置单元表不存在，则配置单元仓库连接器将创建它，如hdp 3.1文档所示：

df = //Create DataFrame from any source

val hive = com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.session(spark).build()

df.write.format(HIVE_WAREHOUSE_CONNECTOR)
.option("table", "my_Table")
.save()

赞(0）回复(0）举报 2021-06-27

我来回答

无法将表保存到hive metastore，hdp3.0

2条答案

相关问题

热门标签

最新问答