Dataframe1
+----+------+------+-----+-----+
|key |dc_count|dc_day_count |
+----+------+------+-----+-----+
| 123 |13 |66 |
| 123 |13 |12 |
+----+------+------+-----+-----+
规则Dataframe
+----+------+------+-----+-----++------+-----+-----+
|key |rule_dc_count|rule_day_count |rule_out |
+----+------+------+-----+-----++------+-----+-----+
| 123 |2 |30 |139 |
| 123 |null |null |64 |
| 124 |2 |30 |139 |
| 124 |null |null |64 |
+----+------+------+-----+-----+----+------+-----+--
如果dc\u count>rule\u dc\u count和dc\u day\u count>rule\u day\u count填充相应的rule\u out
否则其他的排除”
预期产量
+----+------+------+-
|key |rule_out |
+----+------+------+
| 123 | 139 |
| 124 | 64 |
+----+------+------+
2条答案
按热度按时间11dmarpk1#
Pypark版本
这里的挑战是获取同一列中某个键的第二行值,以便解析这个lead()分析函数。
在此处创建Dataframe
得到期望结果的逻辑
输出
z2acfund2#
假设预期输出为-
下面的查询应该可以工作-