在spark/java中使用windowspec获取空值

zqdjd7g9  于 2021-05-29  发布在  Spark
关注(0)|答案(1)|浏览(273)

我有一个Dataframe:

+----------+------------+------------+--------------------+
|   acc    |id_Vehicule |id_Device   |dateTracking        | 
+----------+------------+------------+--------------------+
|  1       |  1         |  2         |2020-02-12 14:50:00 |            
|  0       |  1         |  2         |2020-02-12 14:59:00 | 
|  0       |  2         |  3         |2020-02-12 15:10:00 |
|  1       |  2         |  3         |2020-02-12 15:20:00 |
+----------+------------+------------+--------------------+

我想得到输出:

+----------+------------+------------+--------------------+----------------+
|   acc    |id_Vehicule |id_Device   |dateTracking        |  acc_previous  |
+----------+------------+------------+--------------------+----------------+
|  1       |  1         |  2         |2020-02-12 14:50:00 | null           |                
|  0       |  1         |  2         |2020-02-12 14:59:00 |  1             |
|  0       |  2         |  3         |2020-02-12 15:10:00 |  null          |
|  1       |  2         |  3         |2020-02-12 15:20:00 |  0             |
+----------+------------+------------+--------------------+----------------+

我尝试了以下代码:

WindowSpec w =org.apache.spark.sql.expressions.Window.partitionBy("idVehicule","idDevice","dateTracking").orderBy("dateTracking");
    Dataset <Row> df= df1.withColumn("acc_previous",lag("acc",1).over(w));
    df.show();

我得到了结果;

+----------+------------+------------+--------------------+----------------+
|   acc    |id_Vehicule |id_Device   |dateTracking        |  acc_previous  |
+----------+------------+------------+--------------------+----------------+
|  1       |  1         |  2         |2020-02-12 14:50:00 | null           |                
|  0       |  1         |  2         |2020-02-12 14:59:00 | null           |
|  0       |  2         |  3         |2020-02-12 15:10:00 | null           |
|  1       |  2         |  3         |2020-02-12 15:20:00 | null           |
+----------+------------+------------+--------------------+----------------+

如果你有任何想法,我将非常感激

eoigrqb6

eoigrqb61#

我找到了解决办法,也许能帮助别人。问题是因为“datetracking”列它不应该像分区列,所以我删除了它。

WindowSpec w =org.apache.spark.sql.expressions.Window.partitionBy("idVehicule","idDevice").orderBy("dateTracking");
Dataset <Row> df= df1.withColumn("acc_previous",lag("acc",1).over(w));
df.show();

相关问题