当一个字段符合某些条件时,我试图延迟它,因为我需要使用过滤器,所以我使用max函数来延迟它,因为延迟函数本身不能按我需要的方式工作。我已经能够使用下面的id\u event\u log代码来完成它,但是当我将max中的id\u event\u log更改为ensaio列时,我会延迟ensaio列,它不能正常工作。下面的例子。
数据集:
+------------+---------+------+
|ID_EVENT_LOG|ID_PAINEL|ENSAIO|
+------------+---------+------+
| 1| 1| null|
| 2| 1| null|
| 3| 1|INICIO|
| 4| 1| null|
| 5| 1| null|
| 6| 1| null|
| 7| 1| FIM|
| 8| 1| null|
| 9| 1| null|
| 10| 1| null|
| 11| 2| FIM|
| 12| 2| FIM|
| 13| 2|INICIO|
| 14| 2| null|
| 15| 2| FIM|
+------------+---------+------+
工作代码
DFReadFile = spark.read.format('csv').option("header", "true").option('sep',',').load('12_delete_between_inicio_fim_v4.csv')
DFReadFile.show()
DFReadFile.createOrReplaceTempView("12_delete_between_inicio_fim")
sqlDF = spark.sql("SELECT *, \
CASE \
WHEN (ENSAIO like '%null%') THEN \
MAX(CASE WHEN (ENSAIO like '%INICIO%') OR (ENSAIO like '%FIM%') THEN ID_EVENT_LOG END) \
OVER (PARTITION BY ID_PAINEL ORDER BY int(ID_EVENT_LOG) RANGE BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) \
ELSE 0 \
END as TESTE \
FROM 12_delete_between_inicio_fim")
sqlDF.show()
结果:
+------------+---------+------+-----+
|ID_EVENT_LOG|ID_PAINEL|ENSAIO|TESTE|
+------------+---------+------+-----+
| 1| 1| null| null|
| 2| 1| null| null|
| 3| 1|INICIO| 0|
| 4| 1| null| 3|
| 5| 1| null| 3|
| 6| 1| null| 3|
| 7| 1| FIM| 0|
| 8| 1| null| 7|
| 9| 1| null| 7|
| 10| 1| null| 7|
| 11| 2| FIM| 0|
| 12| 2| FIM| 0|
| 13| 2|INICIO| 0|
| 14| 2| null| 13|
| 15| 2| FIM| 0|
+------------+---------+------+-----+
要解决的错误:
数据集相同
无效代码(唯一的更改是ensaio的id\事件\日志):
DFReadFile = spark.read.format('csv').option("header", "true").option('sep',',').load('12_delete_between_inicio_fim_v4.csv')
DFReadFile.show()
DFReadFile.createOrReplaceTempView("12_delete_between_inicio_fim")
sqlDF = spark.sql("SELECT *, \
CASE \
WHEN (ENSAIO like '%null%') THEN \
MAX(CASE WHEN (ENSAIO like '%INICIO%') OR (ENSAIO like '%FIM%') THEN ENSAIO END) \
OVER (PARTITION BY ID_PAINEL ORDER BY int(ID_EVENT_LOG) RANGE BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) \
ELSE 0 \
END as TESTE \
FROM 12_delete_between_inicio_fim")
sqlDF.show()
结果:
+------------+---------+------+------+
|ID_EVENT_LOG|ID_PAINEL|ENSAIO| TESTE|
+------------+---------+------+------+
| 1| 1| null| null|
| 2| 1| null| null|
| 3| 1|INICIO| 0|
| 4| 1| null|INICIO|
| 5| 1| null|INICIO|
| 6| 1| null|INICIO|
| 7| 1| FIM| 0|
| 8| 1| null|INICIO|
| 9| 1| null|INICIO|
| 10| 1| null|INICIO|
| 11| 2| FIM| 0|
| 12| 2| FIM| 0|
| 13| 2|INICIO| 0|
| 14| 2| null|INICIO|
| 15| 2| FIM| 0|
+------------+---------+------+------+
预期结果:
+------------+---------+------+------+
|ID_EVENT_LOG|ID_PAINEL|ENSAIO| TESTE|
+------------+---------+------+------+
| 1| 1| null| null|
| 2| 1| null| null|
| 3| 1|INICIO| 0|
| 4| 1| null|INICIO|
| 5| 1| null|INICIO|
| 6| 1| null|INICIO|
| 7| 1| FIM| 0|
| 8| 1| null| FIM|
| 9| 1| null| FIM|
| 10| 1| null| FIM|
| 11| 2| FIM| 0|
| 12| 2| FIM| 0|
| 13| 2|INICIO| 0|
| 14| 2| null|INICIO|
| 15| 2| FIM| 0|
+------------+---------+------+------+
先谢谢你
暂无答案!
目前还没有任何答案,快来回答吧!