python 计算Pandasdf中的字符串总数

mi7gmzs6  于 2023-03-11  发布在  Python
关注(0)|答案(1)|浏览(139)

我有一个这样的 Dataframe (路径很长,所以顺序不是很好):

api_spec_id        paths.modified                                           components.schemas.added
276 779.0       
277 779.0   {'/system/identity/v2beta1/principals': {'operations': {'modified': {'GET': {'description': {'from': 'Returns the list of principals known to IAC.', 'to': 'Returns the list of principals that the Identity service knows about.'}}}}}}    
278 779.0   {'/{tenant}/identity/v2beta1/validate': {'operations': {'modified': {'GET': {'parameters': {'modified': {'query': {'include': {'schema': {'uniqueItems': {'from': False, 'to': True}, 'maxItems': {'from': None, 'to': 2}, 'items': {'enum': {'added': ['tenant', 'principal']}}}}}}}, 'responses': {'modified': {'200': {'content': {'mediaTypeModified': {'application/json': {'schema': {'required': {'stringsdiff': {'added': ['kind']}}, 'properties': {'added': ['kind']}}}}}}}}}}}}} ['AddGroupMemberAsAnAdminBody', 'ServiceAccountPeer', 'ServiceAccounts', 'ServiceAccount', 'ServiceAccountPeers']
279 779.0       
280 779.0                                                              ['CreateSvcPrincipalTokenBody', 'AdminServicePrincipal']
281 779.0                                                              ['UpdateTenantBody', 'TenantAsAdmin']
283 779.0                                                              ['UpdateTenantBody', 'TenantAsAdmin']
284 779.0                                                              ['OAuth2Client']
285 779.0   {'/{tenant}/identity/v2beta1/groups': {'operations': {'modified': {'GET': {'parameters': {'added': {'query': ['access']}}}}}}, '/{tenant}/identity/v2beta1/members/{member}/permissions': {'operations': {'modified': {'GET': {'responses': {'modified': {'200': {'headers': {'added': ['Cache-Control']}}}}}}}}, '/{tenant}/identity/v2beta1/validate': {'operations': {'modified': {'GET': {'responses': {'modified': {'200': {'content': {'mediaTypeModified': {'application/json': {'schema': {'properties': {'modified': {'tenant': {'properties': {'modified': {'status': {'enum': {'added': ['tombstoned']}}}}}}}}}}}}}}}}}}}    ['OktaApp']

我想计算每个column中用逗号分隔的字符串的数量,因此预期输出应该如下所示:(我已经剪切了原始列值,因为它们太长,无法显示输出)

api_spec_id        paths.modified                      components.schemas.added            Paths.Modified Count       Component.schemas.added.count

276 779.0       
277 779.0   {'/system/identity/v2beta1/                                                        1                            0
278 779.0   {'/{tenant}/identity/v2beta1/      ['AddGroupMemberAsAnAdminBody..]                5                            5
279 779.0            
280 779.0                                      [CreateSvcPrincipalTokenBody'..]                0                            0   
281 779.0                                      ['UpdateTenantBody', 'TenantAsAdmin']           0                            2               
283 779.0                                      ['UpdateTenantBody','TenantAsAdmin']            0                            2
284 779.0                                      ['OAuth2Client']                                0                            1
285 779.0   {'/{tenant}/identity/v2beta1/      ['OktaApp']                                     3                            1

我尝试使用urllib.parse提取单个字符串,然后计算unique计数,但这只适用于少数几列,因为字符串的起始格式不同,可以是[{htttps。我不确定如何实现这一点,如果能提供一些建议,我将不胜感激。

tvz2xvvm

tvz2xvvm1#

如果列值只是带有逗号分隔符的字符串,则可以用途:

df['paths.modified.count'] = df['paths.modified'].str.split(',').map(len)

并且对于其它列(一个或多个)也是类似的。

相关问题