我有一个Pandasdf,看起来像下面这样:
+---------+---------+------------+--------+
| Cluster | Country | Publishers | Assets |
+---------+---------+------------+--------+
| South | IT | SS | Asset1 |
| South | IT | SS | Asset2 |
| South | IT | SS | Asset3 |
| South | IT | ML | Asset1 |
| South | IT | ML | Asset2 |
| South | IT | ML | Asset3 |
| South | IT | TT | Asset1 |
| South | IT | TT | Asset2 |
| South | IT | TT | Asset3 |
| South | ES | SS | Asset1 |
| South | ES | SS | Asset2 |
+---------+---------+------------+--------+
我想创建一个新列“Package”,该列使用基于以下列的累计计数:
- 出版商
- 资产
结果会是这样的:
+---------+---------+------------+--------+---------+
| Cluster | Country | Publishers | Assets | Package |
+---------+---------+------------+--------+---------+
| South | IT | SS | Asset1 | 1 |
| South | IT | SS | Asset2 | 1a |
| South | IT | SS | Asset3 | 1b |
| South | IT | ML | Asset1 | 2 |
| South | IT | ML | Asset2 | 2a |
| South | IT | ML | Asset3 | 2b |
| South | IT | TT | Asset1 | 3 |
| South | IT | TT | Asset2 | 3a |
| South | IT | TT | Asset3 | 3b |
| South | ES | SS | Asset1 | 4 |
| South | ES | SS | Asset2 | 4a |
+---------+---------+------------+--------+---------+
到目前为止我试过
df['Package'] = df.groupby(['Cluster','Publishers']).cumcount(),但它似乎不起作用,因为在每个发布者示例完成后,值重置为0。
1条答案
按热度按时间sczxawaw1#
您可以使用
groupby.cumcount
,但使用不同的grouper。您还需要相关的groupby.ngroup
:输出: