我连接两个PySpark DataFrames如下:
exprs = [max(x) for x in ["col1","col2"]] df = df1.union(df2).groupBy(['campk', 'ppk']).agg(*exprs)
但我得到了这个错误:
AssertionError: all exprs should be Column
怎么了?
bt1cpqcv1#
exprs = [max(x) for x in ["col1","col2"]]
将返回具有最大ASCII值的字符,即['o', 'o']引用正确的max即可:
['o', 'o']
max
>>> from pyspark.sql import functions as F >>> exprs = [F.max(x) for x in ["col1","col2"]] >>> print(exprs) [Column<max(col1)>, Column<max(col2)>]
xwmevbvl2#
尝试下面的代码从pyspark.sql导入函数作为F exprs = [F.max(x)for x in [“col1”,“col2”]] print(*exprs)
2条答案
按热度按时间bt1cpqcv1#
将返回具有最大ASCII值的字符,即
['o', 'o']
引用正确的
max
即可:xwmevbvl2#
尝试下面的代码从pyspark.sql导入函数作为F exprs = [F.max(x)for x in [“col1”,“col2”]] print(*exprs)