我目前正在尝试在一个新的虚拟环境中合并两个表,只安装了pyspark和pyspark[sql]。我已经为每个表创建了视图,并运行了一些基本的查询,这些查询运行得很好。但以下使用Merge的查询失败-
MERGE INTO current
USING (
SELECT updates.Name as mergeKey, updates.*
FROM updates
UNION ALL
SELECT NULL as mergeKey, updates.*
FROM updates JOIN current
ON updates.Name = current.Name
WHERE current.current = true
) staged_updates
ON current.Name = mergeKey
WHEN MATCHED AND current.current = true THEN
UPDATE SET current = false, validity_end = CURRENT_TIMESTAMP()
WHEN NOT MATCHED THEN
INSERT(S_No,Name, DOB, validity_start, validity_end, current)
VALUES(null,staged_updates.Name, staged_updates.updated_DOB, current_timestamp(), null, true)
字符串
当我使用spark.sql(merge_query)
运行查询时,我得到以下错误-
Py4JJavaError: An error occurred while calling o23.sql.
: org.apache.spark.SparkUnsupportedOperationException: MERGE INTO TABLE is not supported temporarily.
型
完整的错误代码可以在这里找到-https://pastebin.com/mgL92Vgx
1条答案
按热度按时间nbysray51#
只安装了pyspark和pyspark[sql]
合并或更新/删除在vanilla spark中没有实现。您需要添加一个依赖项,例如: