在这里,我试图将Dataframe持久化到一个分区的hive表中,并得到这个愚蠢的异常。我看了很多遍,但都找不到毛病。
org.apache.spark.sql.analysisexception:指定的分区列(时间戳值)与表的分区列不匹配。请使用()作为分区列。;
下面是创建外部表的脚本,
CREATE EXTERNAL TABLEIF NOT EXISTS events2 (
action string
,device_os_ver string
,device_type string
,event_name string
,item_name string
,lat DOUBLE
,lon DOUBLE
,memberid BIGINT
,productupccd BIGINT
,tenantid BIGINT
) partitioned BY (timestamp_val DATE)
row format serde 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
stored AS inputformat 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
location 'maprfs:///location/of/events2'
tblproperties ('serialization.null.format' = '');
下面是表“events2”的结果
hive> describe formatted events2;
OK
# col_name data_type comment
action string
device_os_ver string
device_type string
event_name string
item_name string
lat double
lon double
memberid bigint
productupccd bigint
tenantid bigint
# Partition Information
# col_name data_type comment
timestamp_val date
# Detailed Table Information
Database: default
CreateTime: Wed Jan 11 16:58:55 IST 2017
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: maprfs:/location/of/events2
Table Type: EXTERNAL_TABLE
Table Parameters:
EXTERNAL TRUE
serialization.null.format
transient_lastDdlTime 1484134135
# Storage Information
SerDe Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.078 seconds, Fetched: 42 row(s)
这是一行代码,其中数据被分区并存储到表中,
val tablepath = Map("path" -> "maprfs:///location/of/events2")
AppendDF.write.format("parquet").partitionBy("Timestamp_val").options(tablepath).mode(org.apache.spark.sql.SaveMode.Append).saveAsTable("events2")
在运行应用程序时,我得到以下信息
指定的分区列(timestamp_val)与表的分区列不匹配。请使用()作为分区列。
我可能犯了一个明显的错误,任何帮助都非常感谢您的支持:)
1条答案
按热度按时间o4tp2gmn1#
请打印df的架构:
确保它不是类型不匹配??