在scala中替代groupby和collect\u列表

zf9nrax1  于 2021-07-09  发布在  Spark
关注(0)|答案(0)|浏览(210)

有没有一种方法可以从给定的输入Dataframe创建下面的输出Dataframe,而不使用groupby和collect\u list?
输入Dataframeschema:-

root
 |-- GPS: struct (nullable = false)
 |    |-- requestid: string (nullable = true)
 |    |-- timestamp: double (nullable = true)
 |    |-- GPSLatitude: double (nullable = true)
 |    |-- GPSLongitude: double (nullable = true)
 |-- requestid: string (nullable = true)

输入dataframe:-
gpsrequestid{“requestid”:“2b7bfbd4cf7c3124a518ec6015b1ef85b\u 0”,“timestamp”:1596368673,“gpslatitude”:40.13587319463796,“gpslengitude”:-75.15846220892067}2b7bbd4cf7c3124a11251ec6015b1ef85b\u 0{“requestid”:“2b7bfbd4cf7c3124a518ec6015b1ef85b\u 0”,“timestamp”:1596368674,“gpslatitude”:40.135924326024096,“gpslongitude”:-75.15865512908896}2b7bfbd4cf7c3124a512851ec6015b1ef85b{“requestid”:“2b7bfbd4cf7c3124a5112851ec6015b1ef85b{0”,“timestamp”:1596368675,“gpslatitude”:40.13599278802667,“gpslongitude”:-75.1587673291171}2b7bfbd4cf7c3124a51ec6015b1ef85b{“requestid”:“2b7bfbd4cf7c3124a51ec6015b1ef85b{0”,“timestamp 1596368676,“gpslatitude”:40.136083261014484,“gpslngitude”:-75.15885143441842}2B7BFBD4CF7C3124A517A112851EC6015B1EF85B0{“requestid”:“2B7BFBD4CF7C3124A517A112851EC6015B1EF85B0”,“timestamp”:1596368677,“gpslatitude”:40.13616687156273,“gpslngitude”:-75.1589827147081}2b7bfbd4cf7c3124a112851ec6015b1ef85b硙0{“requestid”:“2b7bfbd4cf7c3124a112851ec6015b1ef85b硙1”,“时间戳”:1596368833,“gpslatitude”:40.14496631033691,“gpslngitude”:-75.17361394861283}2b7bfbd4cf7c3124a112851ec6015b1ef85b硙1{“requestid”:“2b7bfbd4cf7c3124a124a515b1ef851ec6015b1ef85b硙1”,“时间戳”:1596368834,“gpslatitude”:40.14509290798243,“gpslngitude”:-75.17385201087406}2b7bfbd4cf7c3124a17a112851ec6015b1ef85b\u 1{“requestid”:“2b7bfbd4cf7c3124a17a112851ec6015b1ef85b\u 1”,“时间戳”:1596368835,“gpslatitude”:40.145218343731,“gpslongitude”:-75.17407707132585}2b7bfbd4cf7c3124a112851ec6015b1ef85b匼1{“requestid”:“2b7bfbd4cf7c3124a112851ec6015b1ef85b匼1”,“timestamp”:1596368836,“gpslatitude”:40.145350564938425,“gpslongitude”:-75.17430271187274}2b7fbd4cf7c3124a112851ec6015b1ef85b匼{“requestid”:“2b7bfbd4cf7c3124a124a51ec6015b1ef85b匼1”,“timestamp”:“1596368837,“gpslatitude”:40.14548270568285,“gpslngitude”:-75.17452650958782}2b7bfbd4cf7c3124a51a112851ec6015b1ef85b匼1{“requestid”:“2b7bfbd4cf7c3124a118a11251ec6015b1ef85b匼2”,“时间戳”:1596368838,“gpslatitude”:40.14560747391316,“gpslngitude”:-75.17474553105055}2b7bfbd4cf7c3124a517a112851ec6015b1ef85b撴2{“requestid”:“2b7bfbd4cf7c3124a517a112851ec6015b1ef85b撴2”,“时间戳”:1596368839,“gpslatitude”:40.14560753483339,“gpslngitude”:-75.17474563799348}2B7BFBD4C3124A517A112851EC6015B1EF85B撴2
输出dataframe:-
requestidgps2b7bfbd4cf7c3124a517a112851ec6015b1ef85b\U 0[{“requestid”:“2b7bfbd4cf7c3124a517a112851ec6015b1ef85b\U 0”,“时间戳”:1596368673,“gpslatitude”:40.13587319463796,“gpslongitude”:-75.15846220892067},{“requestid”:“2b7bfbd4cf7c3124a517a112851ec6015b1ef85b\U 0”,“时间戳”:1596368674,“gpslatitude”:40.135924326024096,“gpslngitude”:-75.15865512908896},{“requestid”:“2b7bfbd4cf7c3124a5112851ec6015b1ef85b\u 0”,“时间戳”:1596368675,“gpsltitude”:40.13599278802667,“gpslngitude”:-75.1587673291171},{“requestid”:“2b7bfbd4cf7c3124a12851ec6015b1ef85b\u 0”,“时间戳”:1596368676,“gpsltitude”:40.136083261014484,“gpslngitude-75.15885143441842},{“requestid”:“2B7FBD4CF7C3124A517A112851EC6015B1EF85B\U 0”,“timestamp”:1596368677,“gpslatitude”:40.13616687156273,“gpslatitude”:-75.1589827147081}]2B7FBD4CF7C3124A517A112851EC6015B1EF85B\U 1[{“requestid”:“2B7FBD4CF7C3124A517A112851EC6015B1EF85B\U 1”,“timestamp”:1596368833,“gpslatitude”:40.14496631033691,“gpslatitude”:-75.17361394861283},{“requestid”:“2B7FBD4CF7C3124A517A112851EC6015B1EF85B\U 1”,“timestamp”:1596368834,“gpslatitude”:40.14509290798243,“gpslongitude”:-75.17385201087406},{“requestid”:“2B7FBD4CF7C3124A517A112851EC6015B1EF85B\U 1”,“timestamp”:1596368835,“gpslatitude”:40.145218343731,“gpslongitude”:-75.17407707132585},{“requestid”:“2b7bfbd4cf7c3124a112851ec6015b1ef85b撸1”,“timestamp”:1596368836,“gpslatitude”:40.145350564938425,“gpslngitude”:-75.17430271187274},{“requestid”:“2b7bbd4cf7c3124a118a112851ec6015b1ef85b撸1”,“timestamp”:1596368837,“gpslatitude”:40.14548270568285,“gpslongitude”:-75.17452650958782}]2b7bfbd4cf7c3124a512851ec6015b1ef85b琰2[{”requestid”:“2b7bfbd4cf7c3124a5112851ec6015b1ef85b琰2”,“时间戳”:1596368838,“gpslatitude”:40.14560747391316,“gpslongitude”:-75.17474553105055},{“requestid”:“2b7bfbd4cf7c3124a17a112851ec6015b1ef85b琰2”,“时间戳”:1596368839,“gpslatitude”:40.14560753483339,“gpslngitude”:-75.17474563799348}]
输出Dataframeschema:-

root
 |-- requestid: string (nullable = true)
 |-- GPS: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- requestid: string (nullable = true)
 |    |    |-- timestamp: double (nullable = true)
 |    |    |-- GPSLatitude: double (nullable = true)
 |    |    |-- GPSLongitude: double (nullable = true)

版本:(spark 2.45和scala 2.11)

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题