在pyspark中时使用AnalysisException

zdwk9cvp  于 2022-11-01  发布在  Spark
关注(0)|答案(2)|浏览(182)

我有两个 Dataframe :

df = spark.createDataFrame([("joe", 34), ("luisa", 22)], ["name", "age"])
df2 = spark.createDataFrame([("joe", 88), ("luisa", 99)], ["name", "age"])

我想在名字匹配时更新年龄。所以我想使用when()会起作用。

df.withColumn("age", F.when(df.name == df2.name, df2.age)).otherwise(df.age)

但这会导致以下错误:

AnalysisException: Resolved attribute(s) name#181,age#182L missing from name#177,age#178L in operator !Project [name#177, CASE WHEN (name#177 = name#181) THEN age#182L END AS age#724L]. Attribute(s) with the same name appear in the operation: name,age. Please check if the right attribute(s) are used.;

如何解决这个问题?因为当我打印when语句时,我会看到:

Column<'CASE WHEN (name = name) THEN age ELSE age END'>
jxct1oxe

jxct1oxe1#

通过“更新年龄”,我假设您想要最新/最大年龄:

df \
  .join(df2, how="inner", on="name") \
  .withColumn("updated_age", F.greatest(df2.age, df.age)) \
  .select("name", F.col("updated_age").alias("age"))

[Out]:
+-----+---+
| name|age|
+-----+---+
|  joe| 88|
|luisa| 99|
+-----+---+
tjjdgumg

tjjdgumg2#

您需要一个join:

df.join(df2, how="left", on="name").withColumn("age", F.coalesce(df2.age, df.age))

相关问题