提取pandas列中的版本

jhkqcmku  于 2022-11-27  发布在  其他
关注(0)|答案(2)|浏览(133)

我有一个 Dataframe 列,如下所示:

paths                    
0      ['/api/v2/clouds', '/api/v2/clouds/{cloud}']                      
1      ['/v0.1/book-lists/{type}/{date}', '/v0.1/book-lists]                
2      ['/v1/Video/Rooms', '/v1/Video/Rooms/{RoomSid}'....]                
3      ['/v3/attachments/{attachmentId}', '/v3/attachments]                
4      '/v0.1/patrons', '/v0.2/patrons', '/v0.3/patrons/dependents]

我想以这样的格式从列中提取versions
我想要的输出是:

paths                    Path_Version 
0      ['/api/v2/clouds', '/api/v2/clouds/{cloud}']                      v2   
1      ['/v0.1/book-lists/{type}/{date}', '/v0.1/book-lists]             v0.1   
2      ['/v1/Video/Rooms', '/v1/Video/Rooms/{RoomSid}'....]              v2  
3      ['/v3/attachments/{attachmentId}', '/v3/attachments]              v3  
4      ['/v0.1/patrons', '/v0.2/patrons', '/v0.3/patrons/dependents]      v0.1/v0.2/v0.3

我试过这个:

keywords = ['v1', 'v2', 'v3', 'v4', 'v1.0', 'v1.2', 'v1.1', 'v0.1', 'v0.2','v1.3', 'v1.4', 'v3.1', 'v3.2', '0.1.0', '3.1', 'v0.0.2', 'v0.0.3', 'v0.0.4', '1.0.0']
final_api['Path_Version'] = final_api['paths'].str.findall('(' + '|'.join(keywords) + ')')

但是没有产生任何结果。我也看过其他代码,但是没有一个给予我想要的输出。我正在努力弄清楚这一点,任何帮助都将不胜感激。

axkjgtzd

axkjgtzd1#

不需要关键字,只需像开始那样使用pandas.Series.str.findall即可:

df["Path_Version"]= (
                        df["paths"].str.findall(r"(v\d\.?\d?)")
                                   .apply(lambda x: "/".join(set(x)))
                    )
#输出:
print(df.to_string())
                                                          paths    Path_Version
0                  ['/api/v2/clouds', '/api/v2/clouds/{cloud}']              v2
1         ['/v0.1/book-lists/{type}/{date}', '/v0.1/book-lists]            v0.1
2          ['/v1/Video/Rooms', '/v1/Video/Rooms/{RoomSid}'....]              v1
3          ['/v3/attachments/{attachmentId}', '/v3/attachments]              v3
4  '/v0.1/patrons', '/v0.2/patrons', '/v0.3/patrons/dependents]  v0.2/v0.3/v0.1
gcuhipw9

gcuhipw92#

这看起来像是正则表达式的一个很好的候选:

import pandas as pd
import re

data = [
      [['/api/v2/clouds', '/api/v2/clouds/{cloud}']],
      [['/v0.1/book-lists/{type}/{date}', '/v0.1/book-lists']],
      [['/v1/Video/Rooms', '/v1/Video/Rooms/{RoomSid}']],
      [['/v3/attachments/{attachmentId}', '/v3/attachments']],
      [['/v0.1/patrons', '/v0.2/patrons', '/v0.3/patrons/dependents']]
]

df = pd.DataFrame(data, columns=['paths'])

ver = re.compile(r'/(v\d(\.\d)?)/')
def getver(row):
    vsets = set()
    for p in row:
        chk = ver.search(p)
        vsets.add( chk.group(1) )
    return '/'.join(vsets)

df['Version'] = df.paths.apply(getver)
print(df)

输出量:

paths         Version
0           [/api/v2/clouds, /api/v2/clouds/{cloud}]              v2
1  [/v0.1/book-lists/{type}/{date}, /v0.1/book-li...            v0.1
2       [/v1/Video/Rooms, /v1/Video/Rooms/{RoomSid}]              v1
3  [/v3/attachments/{attachmentId}, /v3/attachments]              v3
4  [/v0.1/patrons, /v0.2/patrons, /v0.3/patrons/d...  v0.2/v0.3/v0.1

相关问题