插入外部配置单元表错误

6ojccjat 于 2021-06-27 发布在 Hive

关注(0)|答案(4)|浏览(313)

我正在尝试通过sparksql将数据插入到外部配置单元表中。我的 hive table是通过一根柱子扣起来的。创建外部配置单元表的查询如下

create external table tab1 ( col1 type,col2 type,col3 type) clustered by (col1,col2) sorted by (col1) into 8 buckets stored as parquet

现在我尝试将Parquet文件（存储在hdfs中）中的数据存储到表中。这是我的密码

SparkSession session = SparkSession.builder().appName("ParquetReadWrite").
                    config("hive.exec.dynamic.partition", "true").
                    config("hive.exec.dynamic.partition.mode", "nonstrict").
                    config("hive.execution.engine","tez").
                    config("hive.exec.max.dynamic.partitions","400").
                    config("hive.exec.max.dynamic.partitions.pernode","400").
                    config("hive.enforce.bucketing","true").
                    config("optimize.sort.dynamic.partitionining","true").
                    config("hive.vectorized.execution.enabled","true").
                    config("hive.enforce.sorting","true").
                    enableHiveSupport()
                    .master(args[0]).getOrCreate();
String insertSql="insert into tab1 select * from"+"'"+parquetInput+"'";

session.sql(insertSql);

当我运行代码时，它抛出以下错误
输入不匹配“”hdfs://url：port/user/clsadmin/somedata.parquet“”应为“”（第1行，位置50）
==sql==insert into uk\地区\月份\数据选择*from'hdfs://url：port/user/clsadmin/somedata.parquet'

Hive apache-spark apache-spark-sql parquet

来源：https://stackoverflow.com/questions/52638606/spark-sql-insert-into-external-hive-table-error

4条答案

按热度按时间

shyt4zoc1#

正在配置单元中创建外部表，要指定hdfs位置。

create external table tab1 ( col1 type,col2 type,col3 type) 
clustered by (col1,col2) sorted by (col1) into 8 buckets 
stored as parquet 
LOCATION hdfs://url:port/user/clsadmin/tab1

没有必要让hive来填充数据，同一个应用程序或其他应用程序都可以将数据摄取到位置中，而hive将通过定义位置顶部的模式来访问数据。

==sql==insert into uk\地区\月份\数据选择*from'hdfs://url：port/user/clsadmin/somedata.parquet'

赞(0）回复(0）举报 2021-06-27

nszi6y052#

^^^*
parquetinput是parquet hdfs文件路径，而不是配置单元表名。因此出现了错误。
有两种方法可以解决此问题：
定义“parquetinput”的外部表并给出表名
使用 LOAD DATA INPATH 'hdfs://url:port/user/clsadmin/somedata.parquet' INTO TABLE tab1

赞(0）回复(0）举报 2021-06-27

6gpjuf903#

^^^

at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:239)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:115)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)

将hive执行引擎用作tez和spark有什么区别？

赞(0）回复(0）举报 2021-06-27

ryevplcw4#

你试过了吗
将数据本地加载到路径“/path/to/data”
改写为表tablename；

赞(0）回复(0）举报 2021-06-27