在尝试重新分区delta lake表时,分区为date(yyyy-mm-dd)和time(hhmm)。我得到的错误是:
File "/usr/local/lib/python3.7/site-packages/pyspark/sql/readwriter.py", line 739, in save
self._jwrite.save(path)
File "/usr/local/lib/python3.7/site-packages/py4j/java_gateway.py", line 1257, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/usr/local/lib/python3.7/site-packages/pyspark/sql/utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: "Cannot recognize the predicate 'Column<b'((partitionTime = 1357) AND (partitionDate = 2020-10-27))'>';"
我可以分别查询这两个分区,但当我同时查询这两个分区时,就会出现上述错误
spark \
.read.format("delta") \
.load(table_path) \
.where(((sf.col("partitionTime") == "1357") & (sf.col("partitionDate") == "2020-10-27"))) \
.repartition(n_partitions) \
.write \
.option("dataChange", "false") \
.format("delta") \
.mode("overwrite") \
.option("replaceWhere", ((sf.col("partitionTime") == "1357") & (sf.col("partitionDate") == "2020-10-27") )) \
.save(table_path)
想知道是什么导致了这个问题!我确实遵循了delta.io的文档
暂无答案!
目前还没有任何答案,快来回答吧!