我有一个pyspark Dataframe 要求,我需要输入:
以下是场景:
df1 schema:
root
|-- applianceName: string (nullable = true)
|-- customer: string (nullable = true)
|-- daysAgo: integer (nullable = true)
|-- countAnomaliesByDay: long (nullable = true)
Sample Data:
applianceName | customer | daysAgo| countAnomaliesByDay
app1 cust1 0 100
app1 cust1 1 200
app1 cust1 2 300
app1 cust1 3 400
app1 cust1 4 500
app1 cust1 5 600
app1 cust1 6 700
In df1 schema, I need to add columns - day0,day1,day2,day3,day4,day5,day6 as shown below :
applianceName | customer | day0 | day1| day2 | day3 | day4 | day5| day6
app1 cust1 100 200 300 400 500 600 700
i.e. column day0 - will have countAnomaliesByDay when daysAgo =0, column day1 - will have countAnomaliesByDay when daysAgo =1 and so on.
我该如何实现这一目标?
蒂亚!
1条答案
按热度按时间rt4zxlrg1#
我希望,这对你的解决方案有用。我使用pyspark的pivot函数来执行这个,