pandas 出现多个按标签分组的字段

py49o6xq 于 2023-04-28 发布在其他

关注(0)|答案(1)|浏览(82)

如果这是一个可能的重复，我很抱歉。我有一个看起来像这样的数据框：

label          api_spec_id             content                              
             375.0  
             375.0  
             375.0        Request Parameter Removed, Field type missing, Violation
             375.0        Path Removed w/o Deprecation
             385.0  
minor        385.0        Request Type Change,Removed param, Interface missing
patch        395.0        Path Removed w/o Deprecation
patch        395.0        Path Removed w/o Deprecation
minor        400.0        New Required Request Property
minor        400.0        Response Success State Removed, Violation
major        400.0        Field type changed

我想计算api_spec_id的unique数量，其中在content中，对于每个标签类别，它们有多个字段（它们总是用逗号分隔）。
因此，预期输出为：
patch：0
minor：2
major：0
Nan：1
任何建议将不胜感激。

pandas

来源：https://stackoverflow.com/questions/76086576/occurence-of-more-than-one-fields-grouped-by-label

1条答案

按热度按时间

utugiqy61#

你可以在从str.contains得到的布尔序列上使用groupby.nunique来识别包含逗号的字符串：

out = (df[df['content'].str.contains(',').fillna(False)]
       .groupby('label', dropna=False)['api_spec_id'].nunique()
       .reindex(df['label'].unique(), fill_value=0)
      )

或者

out = (
 df['api_spec_id'].where(df['content'].str.contains(',').fillna(False))
 .groupby(df['label'], dropna=False).nunique()
)

输出：

label
NaN      1
minor    2
patch    0
major    0
Name: api_spec_id, dtype: int64

赞(0）回复(0）举报 2023-04-28

我来回答

pandas 出现多个按标签分组的字段

1条答案

相关问题

热门标签

最新问答