带有分区投影的athena不返回任何结果

o0lyfsai  于 2021-05-27  发布在  Hadoop
关注(0)|答案(1)|浏览(404)

在为新的etl管道做概念验证时,我在aws雅典娜中使用分区投影解决了一些问题。在glue中创建了下表:

CREATE EXTERNAL TABLE `test_interactions`(
  `id` string, 
  `created_at` timestamp, 
  `created_by` string, 
  `type` string, 
  `entity` string)
PARTITIONED BY ( 
  `dt` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
WITH SERDEPROPERTIES ( 
  'projection.dt.format'='yyyy-MM-dd-HH', 
  'projection.dt.interval'='1', 
  'projection.dt.interval.unit'='HOURS', 
  'projection.dt.range'='2020-12-01-00,NOW', 
  'projection.dt.type'='date', 
  'projection.enabled'='true', 
  'storage.location.template'='s3://test-aggs/test-interactions/dt=${dt}') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  's3://test-aggs/test-interactions/'
TBLPROPERTIES (
  'classification'='parquet')

在s3上,有来自kinesis数据firehose的匹配.parquet文件:

test-aggs/test-interactions/dt=2020-12-03-22/file1.parquet
test-aggs/test-interactions/dt=2020-12-03-22/file2.parquet

尝试通过以下方式查询数据:

SELECT * FROM "test_aggs"."test_interactions"
WHERE dt >= '2020-12-02-00' 
AND dt < '2020-12-04-01'

或者

SELECT * FROM "test_aggs"."test_interactions"
WHERE dt = '2020-12-03-22'

返回零结果。
跑步

MSCK REPAIR TABLE;

使数据可查询,但使用这个慢命令,我不必启用分区投影。
你知道为什么这样不行吗?
干杯!

2izufjch

2izufjch1#

解决了这个问题。问题是我在下面添加了投影配置 SERDE PROPERTIES 不低于 TBLPROPERTIES .

CREATE EXTERNAL TABLE `test_interactions`(
  `id` string, 
  `created_at` timestamp, 
  `created_by` string, 
  `type` string, 
  `entity` string)
PARTITIONED BY ( 
  `dt` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  's3://test-aggs/test-interactions/'
TBLPROPERTIES (
  'classification'='parquet', 
  'projection.dt.format'='yyyy-MM-dd-HH', 
  'projection.dt.interval'='1', 
  'projection.dt.interval.unit'='HOURS', 
  'projection.dt.range'='2020-12-01-00,NOW', 
  'projection.dt.type'='date', 
  'projection.enabled'='true', 
  'storage.location.template'='s3://test-aggs/test-interactions/dt=${dt}')

相关问题