假设我有一个这样的Dataframe B_C
是col的concat吗 B
还有上校 C
,和列 selected_B_C
是一个由几个 B_C
组内的列。
+-----------+-----------+--------+--------+-----------------+--------+--------------------------------------+
|A |grp_count_A|B |C |B_C |D |selected_B_C |
+-----------+-----------+--------+--------+-----------------+--------+--------------------------------------+
|1 |6 |30261.41|20091201|30261.41_20091201|99945.83|[30261.41_20091201, 39879.85_20080601]|
|1 |6 |30261.41|20081201|30261.41_20081201|99945.83|[30261.41_20091201, 39879.85_20080601]|
|1 |6 |39879.85|20080601|39879.85_20080601|99945.83|[30261.41_20091201, 39879.85_20080601]|
|1 |6 |69804.42|20080117|69804.42_20080117|99945.83|[30261.41_20091201, 39879.85_20080601]|
|1 |6 |99950.3 |20090301|99950.3_20090301 |99945.83|[30261.41_20091201, 39879.85_20080601]|
|1 |6 |99999.23|20080118|99999.23_20080118|99945.83|[30261.41_20091201, 39879.85_20080601]|
|2 |4 |76498.0 |20150501|76498.0_20150501 |183600.0|[[76498.0_20150501, 76498.0_20150501]]|
|2 |4 |76498.0 |20150501|76498.0_20150501 |183600.0|[[76498.0_20150501, 76498.0_20150501]]|
|2 |4 |76498.0 |20150501|76498.0_20150501 |183600.0|[[76498.0_20150501, 76498.0_20150501]]|
|2 |4 |351378.0|20180620|351378.0_20180620|183600.0|[[76498.0_20150501, 76498.0_20150501]]|
+-----------+-----------+--------+--------+-----------------+--------+--------------------------------------+
我想附加一列 selected
它需要一个值 1
,如果是一行,列 B_C
在col中找到 selected_B_C
,否则 0
,因此最终的Dataframe如下所示。
+-----------+-----------+--------+--------+-----------------+--------+--------------------------------------+--------+
|A |grp_count_A|B |C |B_C |D |selected_B_C |selected|
+-----------+-----------+--------+--------+-----------------+--------+--------------------------------------+--------+
|1 |6 |30261.41|20081201|30261.41_20081201|99945.83|[30261.41_20091201, 39879.85_20080601]|0 |
|1 |6 |30261.41|20091201|30261.41_20091201|99945.83|[30261.41_20091201, 39879.85_20080601]|1 |
|1 |6 |39879.85|20080601|39879.85_20080601|99945.83|[30261.41_20091201, 39879.85_20080601]|1 |
|1 |6 |69804.42|20080117|69804.42_20080117|99945.83|[30261.41_20091201, 39879.85_20080601]|0 |
|1 |6 |99950.3 |20090301|99950.3_20090301 |99945.83|[30261.41_20091201, 39879.85_20080601]|0 |
|1 |6 |99999.23|20080118|99999.23_20080118|99945.83|[30261.41_20091201, 39879.85_20080601]|0 |
|2 |4 |76498.0 |20150501|76498.0_20150501 |183600.0|[[76498.0_20150501, 76498.0_20150501]]|1 |
|2 |4 |76498.0 |20150501|76498.0_20150501 |183600.0|[[76498.0_20150501, 76498.0_20150501]]|1 |
|2 |4 |76498.0 |20150501|76498.0_20150501 |183600.0|[[76498.0_20150501, 76498.0_20150501]]|0 |
|2 |4 |351378.0|20180620|351378.0_20180620|183600.0|[[76498.0_20150501, 76498.0_20150501]]|0 |
+-----------+-----------+--------+--------+-----------------+--------+--------------------------------------+--------+
对col来说最棘手的部分 selected
我只需要一个值在 selected_B_C
有价值 1
对于选定的
例如在组中 2
,即使有3条记录的值为 76498.0_20150501
对于col B_C
,我只想要两张组记录 2
谁的价值是 76498.0_20150501
有价值 1
为了 selected
,作为 selected_B_C
对于组 2
正好有2个元素有值 76498.0_20150501
在col中 selected_B_C
暂无答案!
目前还没有任何答案,快来回答吧!