我正在尝试完成一个简单的“将Dataframe写入hive表”的任务,下面是用java编写的代码。我使用的是cloudera虚拟机,没有任何变化。
public static void main(String[] args) {
String master = "local[*]";
SparkSession sparkSession = SparkSession
.builder().appName(JsonToHive.class.getName())
//.config("spark.sql.warehouse.dir", "hdfs://localhost:50070/user/hive/warehouse/")
.enableHiveSupport().master(master).getOrCreate();
SparkContext context = sparkSession.sparkContext();
context.setLogLevel("ERROR");
SQLContext sqlCtx = sparkSession.sqlContext();
Dataset<Row> rowDataset = sqlCtx.jsonFile("employees.json");
rowDataset.printSchema();
rowDataset.registerTempTable("employeesData");
Dataset<Row> firstRow = sqlCtx.sql("select employee.firstName, employee.addresses from employeesData");
firstRow.show();
sparkSession.catalog().listTables().select("*").show();
firstRow.write().mode() saveAsTable("default.employee");
sparkSession.close();
}
我已经使用hql在配置单元中创建了托管表,
CREATE TABLE employee ( firstName STRING, lastName STRING, addresses ARRAY < STRUCT < street:STRING, city:STRING, state:STRING > > ) STORED AS PARQUET;
我正在读取一个简单的json文件,其中包含“employees.json”中的数据
{"employee":{"firstName":"Neil","lastName":"Irani","addresses":[{"street":"36th","city":"NYC","state":"Ny"},{"street":"37th","city":"NYC","state":"Ny"},{"street":"38th","city":"NYC","state":"Ny"}]}}
上面写着“table” default
. employee
已存在。;“它不附加内容。如何将内容附加到配置单元表??
如果我设置模式(“append”),它不会抱怨,但也不会写内容。。
firstrow.write().mode(“append”)saveastable(“default.employee”);
任何帮助都将不胜感激。。。谢谢。
+-------------+--------+-----------+---------+-----------+
| name|database|description|tableType|isTemporary|
+-------------+--------+-----------+---------+-----------+
| employee| default| null| MANAGED| false|
|employeesdata| null| null|TEMPORARY| true|
+-------------+--------+-----------+---------+-----------+
更新
/usr/lib/hive/conf/hive-site.xml不在类路径中,因此它没有读取表,在将它添加到类路径中之后,它工作正常。。。因为我是从intellij跑过来的,所以我有这个问题。。在生产中,spark conf文件夹将链接到hive-site.xml。。。
1条答案
按热度按时间mrfwxfqh1#
看起来您应该执行insertinto(string tablename)而不是
saveAsTable(String tableName)
.