我们如何在pyspark中使用Round函数和Group by?我有一个spark Dataframe ,需要通过它使用group by和Round函数生成结果?
data1 = [{'Name':'Jhon','ID':21.528,'Add':'USA','ID_2':'30.90'},
{'Name':'Joe','ID':3.69,'Add':'USA','ID_2':'12.80'},
{'Name':'Tina','ID':2.48,'Add':'IND','ID_2':'11.07'},
{'Name':'Jhon','ID':22.22, 'Add':'USA','ID_2':'34.87'},
{'Name':'Joe','ID':5.33,'Add':'INA','ID_2':'56.89'}]
a = sc.parallelize(data1)
在SQL查询中将类似于
select count(ID) as newid, count(ID_2) as secondaryid, round(([newid]+
[secondaryid])/[newid]* 200,1) AS [NEW_PERCENTAGE] FROM DATA1
groupby Name
1条答案
按热度按时间cetgtptt1#
您不能在
groupby
中使用round
,您需要随后创建一个新列: