我有两个 Dataframe :
df = spark.createDataFrame([("joe", 34), ("luisa", 22)], ["name", "age"])
df2 = spark.createDataFrame([("joe", 88), ("luisa", 99)], ["name", "age"])
我想在名字匹配时更新年龄。所以我想使用when()会起作用。
df.withColumn("age", F.when(df.name == df2.name, df2.age)).otherwise(df.age)
但这会导致以下错误:
AnalysisException: Resolved attribute(s) name#181,age#182L missing from name#177,age#178L in operator !Project [name#177, CASE WHEN (name#177 = name#181) THEN age#182L END AS age#724L]. Attribute(s) with the same name appear in the operation: name,age. Please check if the right attribute(s) are used.;
如何解决这个问题?因为当我打印when语句时,我会看到:
Column<'CASE WHEN (name = name) THEN age ELSE age END'>
2条答案
按热度按时间jxct1oxe1#
通过“更新年龄”,我假设您想要最新/最大年龄:
tjjdgumg2#
您需要一个join: