Pandas-累积计数(带标签)

eyh26e7m 于 2022-11-20 发布在其他

关注(0)|答案(1)|浏览(189)

我有一个Pandasdf，看起来像下面这样：

+---------+---------+------------+--------+
| Cluster | Country | Publishers | Assets |
+---------+---------+------------+--------+
| South   | IT      | SS         | Asset1 |
| South   | IT      | SS         | Asset2 |
| South   | IT      | SS         | Asset3 |
| South   | IT      | ML         | Asset1 |
| South   | IT      | ML         | Asset2 |
| South   | IT      | ML         | Asset3 |
| South   | IT      | TT         | Asset1 |
| South   | IT      | TT         | Asset2 |
| South   | IT      | TT         | Asset3 |
| South   | ES      | SS         | Asset1 |
| South   | ES      | SS         | Asset2 |
+---------+---------+------------+--------+

我想创建一个新列“Package”，该列使用基于以下列的累计计数：

出版商
资产

结果会是这样的：

+---------+---------+------------+--------+---------+
| Cluster | Country | Publishers | Assets | Package |
+---------+---------+------------+--------+---------+
| South   | IT      | SS         | Asset1 | 1       |
| South   | IT      | SS         | Asset2 | 1a      |
| South   | IT      | SS         | Asset3 | 1b      |
| South   | IT      | ML         | Asset1 | 2       |
| South   | IT      | ML         | Asset2 | 2a      |
| South   | IT      | ML         | Asset3 | 2b      |
| South   | IT      | TT         | Asset1 | 3       |
| South   | IT      | TT         | Asset2 | 3a      |
| South   | IT      | TT         | Asset3 | 3b      |
| South   | ES      | SS         | Asset1 | 4       |
| South   | ES      | SS         | Asset2 | 4a      |
+---------+---------+------------+--------+---------+

到目前为止我试过
df['Package'] = df.groupby（['Cluster'，'Publishers']）.cumcount（），但它似乎不起作用，因为在每个发布者示例完成后，值重置为0。

pandas

来源：https://stackoverflow.com/questions/74475243/pandas-cumulative-count-with-labeling

1条答案

按热度按时间

sczxawaw1#

您可以使用groupby.cumcount，但使用不同的grouper。您还需要相关的groupby.ngroup：

from string import ascii_lowercase

# group by consecutive identical values
group = df['Publishers'].ne(df['Publishers'].shift()).cumsum()
# alternatively, you can also group by Cluster/Country/Publishers
# group = ['Cluster', 'Country', 'Publisher']

df['Package'] =(
  df.groupby(group).ngroup().add(1).astype(str)
 +df.groupby(group).cumcount().map(dict(enumerate(['']+list(ascii_lowercase))))
)

输出：

Cluster Country Publishers  Assets Package
0    South      IT         SS  Asset1       1
1    South      IT         SS  Asset2      1a
2    South      IT         SS  Asset3      1b
3    South      IT         ML  Asset1       2
4    South      IT         ML  Asset2      2a
5    South      IT         ML  Asset3      2b
6    South      IT         TT  Asset1       3
7    South      IT         TT  Asset2      3a
8    South      IT         TT  Asset3      3b
9    South      ES         SS  Asset1       4
10   South      ES         SS  Asset2      4a

赞(0）回复(0）举报 2022-11-20

我来回答

Pandas-累积计数(带标签)

1条答案

相关问题

热门标签

最新问答