如何读取使用bucketby编写的spark中的Parquet文件

06odsfpq 于 2021-05-27 发布在 Spark

关注(0)|答案(0)|浏览(296)

我已经挣扎了很久了。我知道当你用spark写表格的时候，你必须做到以下几点

spark.format("parquet").option("path", "some_path").saveAsTable("t1")

现在在我的用例中，考虑到一个全新的spark上下文，我可能不得不在很久以后阅读那些Parquet文件。我试过以下方法：

- spark.read.parquet(...)
- spark.read.format("parquet").option("path", "some_path").table("t1")
- spark.sql("create table t1 using parquet location 'some_path'")

但是使用 describe extended t1 没有人告诉他table是扣的。如何在spark中读取这些数据并使用预定义的存储桶？我应该读取数据，将其写入临时表并直接使用它吗(已测试并正常工作，但创建表需要一些空间…）
谢谢
编辑。为了检查bucketing，我还尝试在我的表上运行一些连接（我有两个bucketed表）。查询计划总是执行洗牌，即使

spark.conf.set("spark.sql.sources.bucketing.enabled", true)
spark.conf.set("spark.sql.autoBroadcastJoinThreshold", -1)

apache-spark apache-spark-sql parquet

来源：https://stackoverflow.com/questions/63661093/how-to-read-a-parquet-file-in-spark-which-was-writen-using-bucketby

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

如何读取使用bucketby编写的spark中的Parquet文件

暂无答案！

相关问题

热门标签

最新问答