python Assert错误:所有表达式应为列

z9ju0rcb  于 2023-06-28  发布在  Python
关注(0)|答案(2)|浏览(144)

我连接两个PySpark DataFrames如下:

exprs = [max(x) for x in ["col1","col2"]]
df = df1.union(df2).groupBy(['campk', 'ppk']).agg(*exprs)

但我得到了这个错误:

AssertionError: all exprs should be Column

怎么了?

bt1cpqcv

bt1cpqcv1#

exprs = [max(x) for x in ["col1","col2"]]

将返回具有最大ASCII值的字符,即['o', 'o']
引用正确的max即可:

>>> from pyspark.sql import functions as F
>>> exprs = [F.max(x) for x in ["col1","col2"]]
>>> print(exprs)
[Column<max(col1)>, Column<max(col2)>]
xwmevbvl

xwmevbvl2#

尝试下面的代码从pyspark.sql导入函数作为F exprs = [F.max(x)for x in [“col1”,“col2”]] print(*exprs)

相关问题