presto无法从配置单元中搜索数据

v2g6jxz6  于 2021-05-17  发布在  Spark
关注(0)|答案(1)|浏览(485)

关闭。这个问题需要细节或清晰。它目前不接受答案。
**想改进这个问题吗?**通过编辑这个帖子来添加细节并澄清问题。

上个月关门了。
改进这个问题
我面临一个问题,我不能从Hive的数据,Hive的数据来自Spark。

io.prestosql.spi.PrestoException: Cannot get bucket number from path: hdfs://xxx:8020/warehouse/tablespace/managed/hive/ods_mflex_bpm_szgx.db/workflow_requestbase/year=2018/part-00000-74647672-c3b8-4b36-98d3-95734e8bd376.c000.snappy.orc
    at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:257)
    at io.prestosql.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38)
    at io.prestosql.$gen.Presto_344____20201118_122905_2.run(Unknown Source)
    at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.IllegalStateException: Cannot get bucket number from path: hdfs://xxxx:8020/warehouse/tablespace/managed/hive/ods_mflex_bpm_szgx.db/workflow_requestbase/year=2018/part-00000-74647672-c3b8-4b36-98d3-95734e8bd376.c000.snappy.orc
    at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.lambda$getRequiredBucketNumber$9(BackgroundHiveSplitLoader.java:733)
    at java.base/java.util.OptionalInt.orElseThrow(OptionalInt.java:271)
    at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.getRequiredBucketNumber(BackgroundHiveSplitLoader.java:733)
    at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:511)
    at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:321)
    at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:250)
    ... 6 more

知道原因的人?

w8f9ii69

w8f9ii691#

该表在配置单元元存储中声明为bucketed,但实际文件没有bucketed。您需要修复表声明以使其无扣带。我认为您需要使用hivecli来实现这一点。
注意,即使spark填充了bucketing文件的表,也会导致错误的查询结果,因为https://issues.apache.org/jira/browse/spark-19256. 我们将检测到这一点,并防止错误的查询结果https://github.com/prestosql/presto/pull/6012

相关问题