PySpark -暂时不支持MERGE INTO TABLE

tv6aics1  于 2023-08-02  发布在  Spark
关注(0)|答案(1)|浏览(205)

我目前正在尝试在一个新的虚拟环境中合并两个表,只安装了pyspark和pyspark[sql]。我已经为每个表创建了视图,并运行了一些基本的查询,这些查询运行得很好。但以下使用Merge的查询失败-

MERGE INTO current
USING (

SELECT updates.Name as mergeKey, updates.*
FROM updates

UNION ALL
    
SELECT NULL as mergeKey, updates.*
FROM updates JOIN current
ON updates.Name = current.Name 
WHERE current.current = true 

) staged_updates
ON current.Name = mergeKey
WHEN MATCHED AND current.current = true THEN  
    UPDATE SET current = false, validity_end = CURRENT_TIMESTAMP()    
WHEN NOT MATCHED THEN 
    INSERT(S_No,Name, DOB, validity_start, validity_end, current) 
    VALUES(null,staged_updates.Name, staged_updates.updated_DOB, current_timestamp(), null, true)

字符串
当我使用spark.sql(merge_query)运行查询时,我得到以下错误-

Py4JJavaError: An error occurred while calling o23.sql.
: org.apache.spark.SparkUnsupportedOperationException: MERGE INTO TABLE is not supported temporarily.


完整的错误代码可以在这里找到-https://pastebin.com/mgL92Vgx

nbysray5

nbysray51#

只安装了pyspark和pyspark[sql]
合并或更新/删除在vanilla spark中没有实现。您需要添加一个依赖项,例如:

  • Apache胡迪
  • Apache冰山
  • 三角洲湖

相关问题