在pysparkDataframe中添加最大值为另一列的新列

s2j5cfk0  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(669)

我需要你的帮助。我试图将另一列的最大值追加到现有Dataframe的新列,但出现以下错误。这就是我要做的。

df1 = df.withColumn('WEEK_START_DATE', df.agg(f.max('DATE')))

error:
AttributeError: 'DataFrame' object has no attribute '_get_object_id'
abithluo

abithluo1#

我不认为我们可以在withcolumn中使用聚合函数,但这里是这种情况的解决方法。 1.Using crossJoin :

from pyspark.sql.functions import *
df.show()    

# +---+----+

# | id|name|

# +---+----+

# |  1|   a|

# |  2|   b|

# |  3|   c|

# +---+----+

df1=df.agg(max('id'))
spark.sql("set spark.sql.crossJoin.enabled=true")

# cross join

df.join(df1)

# or

df.crossJoin(df1).show()
+---+----+-------+

# | id|name|max(id)|

# +---+----+-------+

# |  1|   a|      3|

# |  2|   b|      3|

# |  3|   c|      3|

# +---+----+-------+

``` `2. Using Window function:` ```
from pyspark.sql import *
import sys
w=Window.orderBy(monotonically_increasing_id()).rowsBetween(-sys.maxsize,sys.maxsize)
df.withColumn("max",max(col("id")).over(w)).show()

# +---+----+---+

# | id|name|max|

# +---+----+---+

# |  1|   a|  3|

# |  2|   b|  3|

# |  3|   c|  3|

# +---+----+---+

``` `3. Using variable substitution:` ```
max_value=df.agg(max("id")).collect()[0][0]

df.withColumn("max",lit(max_value)).show()

# or

max_value=lit(df.agg(max("id")).collect()[0][0])
type(max_value)

# <class 'pyspark.sql.column.Column'>

df.withColumn("max",max_value).show()

# +---+----+---+

# | id|name|max|

# +---+----+---+

# |  1|   a|  3|

# |  2|   b|  3|

# |  3|   c|  3|

# +---+----+---+

``` `Using Spark-sql:` ```
df.createOrReplaceTempView("tmp")
spark.sql("select * from tmp cross join (select max(id) max_val from tmp) t1").show()

spark.sql("select *,max(id) over(order by id rows between unbounded preceding and unbounded following) as max_val from tmp").show()

max_value=df.agg(max(col("id"))).collect()[0][0]
spark.sql("select *,{0} as max_val from tmp".format(max_value)).show()

# +---+----+-------+

# | id|name|max_val|

# +---+----+-------+

# |  1|   a|      3|

# |  2|   b|      3|

# |  3|   c|      3|

# +---+----+-------+

相关问题