I have some data, that needs to be clusterised into groups. That should be done by a few predifined conditions.
Suppose we have the following table:
d = {'ID': [100, 101, 102, 103, 104, 105],
'col_1': [12, 3, 7, 13, 19, 25],
'col_2': [3, 1, 3, 3, 2, 4]
}
df = pd.DataFrame(data=d)
df.head()
Here, I want to group ID
based on the following ranges, conditions, on col_1
and col_2
.
For col_1
I divide values into following groups: [0, 10]
, [11, 15]
, [16, 20]
, [20, +inf]
For col_2
just use the df['col_2'].unique()
values: [1]
, [2]
, [3]
, [4]
.
The desired groupping is in group_num
column:
- notice, that
0
and3
rows have the same group number and the order, in which group number is assigned*.
For now, I only came up with if-elif
function to pre-define all the groups. It's not the solution for now cause in my real task there are far more ranges and confitions.
My code snippet, if it's relevant:
# This logic is not working cause here I have to predefine all the groups configurations, aka numbers,
# but I want to make groups "dymanicly":
# first group created and if the next row is not in that group -> create new one
def groupping(val_1, val_2):
# not using match case here, cause my Python < 3.10
if ((val_1 >= 0) and (val_1 <10)) and (val_2 == 1):
return 1
elif ((val_1 >= 0) and (val_1 <10)) and (val_2 == 2):
return 2
elif ...
...
df['group_num'] = df.apply(lambda x: groupping(x.col_1, x.col_2), axis=1)
2条答案
按热度按时间j8ag8udp1#
为检查组创建 Dataframe
df1
按
df1
检查分组输出:
输出到
group_num
列实验结果:
wn9m85ua2#
我不确定是否理解完整的逻辑,您不能使用
pandas.cut
:输出量: