我有一个 dataframedf
,如下所示:
VehNum Control_circuit control_circuit_status partnumbers errors Flag
4234456 DOC ok A567UR Software Issue 0
4234456 DOC not_okay A568UR Software Issue 1
4234456 DOC not_okay A569UR Hardware issue 2
4234457 ACR ok A234TY Hardware issue 0
4234457 ACR ok A235TY Hardware issue 0
4234457 ACR ok A234TY Hardware issue 0
4234487 QWR ok A276TY Hardware issue 0
4234487 QWR not_okay A872UR Hardware issue 1
3423448 QWR not_okay A872UR Hardware issue 1
我想添加一个名为"Control_Flag"
的新列,并执行以下操作:对于每个VehNum
、Control_circuit
,如果它具有"control_circuit_status"
,则Control_circuit
具有状态“OK,”因为"Control_Flag"
值将是0
,否则1
。
结果应如下所示:
VehNum Control_circuit control_circuit_status partnumbers errors Flag Control_Flag
4234456 DOC ok A567UR Software Issue 0 0
4234456 DOC not_okay A568UR Software Issue 1 0
4234456 DOC not_okay A569UR Hardware issue 2 0
4234457 ACR ok A234TY Hardware issue 0 0
4234457 ACR ok A235TY Hardware issue 0 0
4234457 ACR ok A234TY Hardware issue 0 0
4234487 QWR ok A276TY Hardware issue 0 1
4234487 QWR not_okay A872UR Hardware issue 1 1
3423448 QWR not_okay A872UR Hardware issue 1 1
如何使用pyspark实现这一点?
1条答案
按热度按时间zbdgwd5y1#
这是解决方案
输出: