我正在尝试使用adls gen2中的位置和parquet格式从databricks创建一个外部表。我从以下url获取Parquet数据集:https://github.com/teradata/kylo/tree/master/samples/sample-data
我在databricks中创建了ddl,执行时弹出如下错误:error in sql statement:analysisexception:org.apache.hadoop.hive.ql.metadata.hiveexception:java.lang.unsupportedoperationexception:parquet不支持时间戳。见hive-6384;
我试着用下面的dataframe创建相同的,
import org.apache.spark.sql.parquet
create external table testdb.ptables;
(
registration_dttm string,
id int,
first_name string,
last_name string,
email string,
gender string,
ip_address string,
cc string,
country string,
birthdate string,
salary double,
title string,
comments string
)
USING parquet
OPTIONS(path "/mnt/landing/testTable/person");
我搞错了, <console>:24: error: ')' expected but string literal found. OPTIONS(path "/mnt/landing/testTable/person");
.
当我将数据类型timestamp更改为string时,我可以创建ddl,但是当我使用查询select*from<table\u name>时,我得到以下错误
Error in SQL statement: SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 11, 10.139.64.4, executor 0): com.databricks.sql.io.FileReadException: Error while reading file dbfs:/mnt/landing/testTable/person/userdata1.parquet.
我有5Parquet文件中提到的位置,它与第一个文件错误了。
在这个问题上谁能帮忙。
谢谢
暂无答案!
目前还没有任何答案,快来回答吧!