python 使用字符串前面的数字重复该字符串,重复次数与数字指示的次数相同

sd2nnvve  于 2023-02-21  发布在  Python
关注(0)|答案(1)|浏览(194)

我试图创建一个将0和1分配给不同角色的备用矩阵,以便匹配两个不同的 Dataframe 。但是,我对数据的格式感到困惑:

data = {'Location': {0: 'Madrid',
  1: 'Barcelona',
  2: 'Paris',
  3: 'London ',
  4: 'New York',
  5: 'Berlin',
  6: 'Birminham',
  7: 'Tanzania'},
 'Description': {0: 'M3',
  1: 'P5',
  2: 'M3P5',
  3: 'M3',
  4: 'M3P5T8',
  5: 'P5T8',
  6: '',
  7: 'FT7 M3'},
 'Branch_A': {0: 'Auditor or Auditor(S), Accountant or Accountant(S), PayRoll_Manager, 2 Brand_Manager, 3 IT_Support, Business_analyst, Developer, Cyber_security',
  1: 'Accountant or Accountant(S), PayRoll_Manager, Brand_Manager, 2 Developer, Cyber_security',
  2: 'Auditor or Auditor(S), 2 Accountant, Business_analyst, Developer, Cyber_security',
  3: "Auditor or Auditor(S), Accountant, PayRoll_Manager, 3 Brand_Manager, 2 IT_Support, Business_analyst, Developer, Cyber_security'",
  4: 'Auditor or Auditor(S), Accountant or Accountant(S), PayRoll_Manager, Brand_Manager, IT_Support, Business_analyst, Developer, Cyber_security',
  5: 'Auditor or Auditor(S), 2 PayRoll_Manager, Brand_Manager, 2 Business_analyst, Developer, Cyber_security',
  6: '----',
  7: 'Auditor or Auditor(S), IT_Support, Business_analyst, Developer, Cyber_security'},
 'Branch_B': {0: 'Accountant or Accountant(S), PayRoll_Manager, Brand_Manager',
  1: 'Accountant or Accountant(S), PayRoll_Manager, Brand_Manager',
  2: 'Accountant or Accountant(S), PayRoll_Manager, Brand_Manager',
  3: 'Accountant or Accountant(S), PayRoll_Manager, Brand_Manager',
  4: '',
  5: 'Accountant or Accountant(S), PayRoll_Manager, Brand_Manager, Developer',
  6: '',
  7: ''},
 'Branch_C': {0: 'IT_Support, Business_analyst, Developer, Cyber_security',
  1: 'IT_Support, Business_analyst, Developer, Cyber_security',
  2: 'IT_Support, Business_analyst, Developer, Cyber_security',
  3: '',
  4: 'IT_Support, Business_analyst, Developer, Cyber_security',
  5: 'IT_Support, Business_analyst, Developer, Cyber_security',
  6: '----',
  7: ''}}

我设法创建了一个字典列表,这样我就可以划分每个角色:

def extract_data(row):
    positions = row['Branch_A'].split(',')
    result = []
    for pos in positions:
        result.append({pos})
    return result

df['Branch_A'] = df.apply(extract_data, axis=1)

获取:

df['Branch_A']
0    [{Auditor or Auditor(S)}, { Accountant or Acco...
1    [{Accountant or Accountant(S)}, { PayRoll_Mana...
2    [{Auditor or Auditor(S)}, { 2 Accountant}, { B...
3    [{Auditor or Auditor(S)}, { Accountant}, { Pay...
4    [{Auditor or Auditor(S)}, { Accountant or Acco...
5    [{Auditor or Auditor(S)}, { 2 PayRoll_Manager}...
6                                             [{----}]
7    [{Auditor or Auditor(S)}, { IT_Support}, { Bus...

我正在尝试做的是,如果角色前面有一个数字,则根据数字重复该角色多次。我的想法是用我在另一个数据框中的员工填充这些角色,但是我认为我找不到一个代码来理解我想要2个工资单经理。此外,在工人数据框中,我有一些工人是审计员,还有一些工人是审计员(S)。2有没有什么方法可以让我编写一个使用“or”运算符的代码?
谢谢

agxfikkp

agxfikkp1#

您所期望的内容并不十分清楚,但是您可以使用以下代码作为起点来提取每个位置的角色数

pat = r'(?P<Number>\d*)?\s*(?P<Role>[^,]+),?\s*'
out = df2.melt(['Location', 'Description'], var_name='Branch', value_name='Positions')
out = out.join(out.pop('Positions').str.extractall(pat).fillna(1).droplevel(1))

输出:

>>> out
     Location Description    Branch Number                         Role
0      Madrid          M3  Branch_A      1        Auditor or Auditor(S)
0      Madrid          M3  Branch_A      1  Accountant or Accountant(S)
0      Madrid          M3  Branch_A      1              PayRoll_Manager
0      Madrid          M3  Branch_A      2                Brand_Manager
0      Madrid          M3  Branch_A      3                   IT_Support
..        ...         ...       ...    ...                          ...
21     Berlin        P5T8  Branch_C      1             Business_analyst
21     Berlin        P5T8  Branch_C      1                    Developer
21     Berlin        P5T8  Branch_C      1               Cyber_security
22  Birminham              Branch_C      1                         ----
23   Tanzania      FT7 M3  Branch_C    NaN                          NaN

[88 rows x 5 columns]

相关问题