我有一个这样的 Dataframe (路径很长,所以顺序不是很好):
api_spec_id paths.modified components.schemas.added
276 779.0
277 779.0 {'/system/identity/v2beta1/principals': {'operations': {'modified': {'GET': {'description': {'from': 'Returns the list of principals known to IAC.', 'to': 'Returns the list of principals that the Identity service knows about.'}}}}}}
278 779.0 {'/{tenant}/identity/v2beta1/validate': {'operations': {'modified': {'GET': {'parameters': {'modified': {'query': {'include': {'schema': {'uniqueItems': {'from': False, 'to': True}, 'maxItems': {'from': None, 'to': 2}, 'items': {'enum': {'added': ['tenant', 'principal']}}}}}}}, 'responses': {'modified': {'200': {'content': {'mediaTypeModified': {'application/json': {'schema': {'required': {'stringsdiff': {'added': ['kind']}}, 'properties': {'added': ['kind']}}}}}}}}}}}}} ['AddGroupMemberAsAnAdminBody', 'ServiceAccountPeer', 'ServiceAccounts', 'ServiceAccount', 'ServiceAccountPeers']
279 779.0
280 779.0 ['CreateSvcPrincipalTokenBody', 'AdminServicePrincipal']
281 779.0 ['UpdateTenantBody', 'TenantAsAdmin']
283 779.0 ['UpdateTenantBody', 'TenantAsAdmin']
284 779.0 ['OAuth2Client']
285 779.0 {'/{tenant}/identity/v2beta1/groups': {'operations': {'modified': {'GET': {'parameters': {'added': {'query': ['access']}}}}}}, '/{tenant}/identity/v2beta1/members/{member}/permissions': {'operations': {'modified': {'GET': {'responses': {'modified': {'200': {'headers': {'added': ['Cache-Control']}}}}}}}}, '/{tenant}/identity/v2beta1/validate': {'operations': {'modified': {'GET': {'responses': {'modified': {'200': {'content': {'mediaTypeModified': {'application/json': {'schema': {'properties': {'modified': {'tenant': {'properties': {'modified': {'status': {'enum': {'added': ['tombstoned']}}}}}}}}}}}}}}}}}}} ['OktaApp']
我想计算每个column
中用逗号分隔的字符串的数量,因此预期输出应该如下所示:(我已经剪切了原始列值,因为它们太长,无法显示输出)
api_spec_id paths.modified components.schemas.added Paths.Modified Count Component.schemas.added.count
276 779.0
277 779.0 {'/system/identity/v2beta1/ 1 0
278 779.0 {'/{tenant}/identity/v2beta1/ ['AddGroupMemberAsAnAdminBody..] 5 5
279 779.0
280 779.0 [CreateSvcPrincipalTokenBody'..] 0 0
281 779.0 ['UpdateTenantBody', 'TenantAsAdmin'] 0 2
283 779.0 ['UpdateTenantBody','TenantAsAdmin'] 0 2
284 779.0 ['OAuth2Client'] 0 1
285 779.0 {'/{tenant}/identity/v2beta1/ ['OktaApp'] 3 1
我尝试使用urllib.parse
提取单个字符串,然后计算unique
计数,但这只适用于少数几列,因为字符串的起始格式不同,可以是[{
或htttps
。我不确定如何实现这一点,如果能提供一些建议,我将不胜感激。
1条答案
按热度按时间tvz2xvvm1#
如果列值只是带有逗号分隔符的字符串,则可以用途:
并且对于其它列(一个或多个)也是类似的。