如何在Python pandas中使用df.groupby()标记每个组?[副本]

h43kikqp  于 2023-06-20  发布在  Python
关注(0)|答案(2)|浏览(98)

此问题已在此处有答案

Pandas - make a column dtype object or Factor(3个答案)
10天前关闭。
假设我们有一个pandas数据框架,如下所示:

Questions  cnt similarity
0       ABC    1  [1, 2, 3]
1       abc    2  [1, 2, 3]
2       cba    3  [2, 3, 1]
3      abcd    4  [4, 5, 6]
4      dcsa    5  [2, 3, 1]
5      adcd    6  [4, 5, 6]
6      abcd    7  [1, 2, 3]
7       cba    8  [7, 8, 9]

我必须在similarity列的基础上添加另一个名为cat的列。如果两行具有相同的similarity,则将它们归类为同一组。下面是预期输出。任何投入都是有价值的。值得一提的是,原始数据集有1M行。谢谢你。

Questions  cnt similarity  cat
0       ABC    1  [1, 2, 3]    1
1       abc    2  [1, 2, 3]    1
2       cba    3  [2, 3, 1]    2
3      abcd    4  [4, 5, 6]    3
4      dcsa    5  [2, 3, 1]    2
5      adcd    6  [4, 5, 6]    3
6      abcd    7  [1, 2, 3]    1
7       cba    8  [7, 8, 9]    4
twh00eeo

twh00eeo1#

IIUC,您可以使用pd.factorize

df["cat"] = pd.factorize(df["similarity"].astype(str))[0] + 1

输出:

print(df)

  Questions  cnt similarity  cat
0       ABC    1  [1, 2, 3]    1
1       abc    2  [1, 2, 3]    1
2       cba    3  [2, 3, 1]    2
3      abcd    4  [4, 5, 6]    3
4      dcsa    5  [2, 3, 1]    2
5      adcd    6  [4, 5, 6]    3
6      abcd    7  [1, 2, 3]    1
7       cba    8  [7, 8, 9]    4
4ktjp1zp

4ktjp1zp2#

一种方法是使用groupby.ngroup()

df['cat'] = df.groupby('similarity').ngroup()+1
Questions  cnt similarity  cat
0       ABC    1  [1, 2, 3]    1
1       abc    2  [1, 2, 3]    1
2       cba    3  [2, 3, 1]    2
3      abcd    4  [4, 5, 6]    3
4      dcsa    5  [2, 3, 1]    2
5      adcd    6  [4, 5, 6]    3
6      abcd    7  [1, 2, 3]    1
7       cba    8  [7, 8, 9]    4

相关问题