如何使用Pandas将行数据转换为列数据?

zwghvu4y  于 2022-12-02  发布在  其他
关注(0)|答案(2)|浏览(205)

I exported many reports from my system in xls in the same specific format and need to change them to another format:
Basically for every item description I need to insert the corresponding Account series it is in column J using pandas.
| Data | CP | N0 | N1 | ITEM | DEBIT | CREDIT | NET | D/C |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| Account: (663) | | | | | | | | |
| 31/10/2022 | 595 | | 12 | ITEM DESCRIPTION 4859 | 5.564,40 | | 59.786,28 | C |
| Account: (664) | | | | | | | | |
| 31/10/2022 | 596 | | 12 | ITEM DESCRIPTION 234243 | 3.475,34 | | 15.492,41 | D |
| 31/10/2022 | 103 | | 14 | ITEM DESCRIPTION 456456 | | 0,01 | 15.492,40 | C |
| Account: (678) | | | | | | | | |
| 31/10/2022 | 597 | | 12 | ITEM DESCRIPTION 2332 | 6.555,27 | | 71.503,39 | C |
| Account: (689) | | | | | | | | |
| 31/10/2022 | 608 | | 13 | ITEM DESCRIPTION 66546 | 266.516,00 | | 504.013,87 | D |
| 31/10/2022 | 608 | | 13 | ITEM DESCRIPTION 57567 | 5.578,67 | | 7.656.192,54 | D |
| Account: (500) | | | | | | | | |
| 31/10/2022 | 608 | | 13 | ITEM DESCRIPTION 345345 | 54.405,00 | | 645.175,00 | D |
I tried to write a script but couldn't fetch a logic to fill the column. Could someone help me?
Desired format:
| Data | CP | N0 | N1 | ITEM | DEBIT | CREDIT | NET | D/C | Account |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| Account: (663) | | | | | | | | | |
| 31/10/2022 | 595 | | 12 | ITEM DESCRIPTION 4859 | 5.564,40 | | 59.786,28 | C | Account: (663) |
| Account: (664) | | | | | | | | | |
| 31/10/2022 | 596 | | 12 | ITEM DESCRIPTION 234243 | 3.475,34 | | 15.492,41 | D | Account: (664) |
| 31/10/2022 | 103 | | 14 | ITEM DESCRIPTION 456456 | | 0,01 | 15.492,40 | C | Account: (664) |
| Account: (678) | | | | | | | | | |
| 31/10/2022 | 597 | | 12 | ITEM DESCRIPTION 2332 | 6.555,27 | | 71.503,39 | C | Account: (678) |
| Account: (689) | | | | | | | | | |
| 31/10/2022 | 608 | | 13 | ITEM DESCRIPTION 66546 | 266.516,00 | | 504.013,87 | D | Account: (689) |
| 31/10/2022 | 608 | | 13 | ITEM DESCRIPTION 57567 | 5.578,67 | | 7.656.192,54 | D | Account: (689) |
| Account: (500) | | | | | | | | | |
| 31/10/2022 | 608 | | 13 | ITEM DESCRIPTION 345345 | 54.405,00 | | 645.175,00 | D | Account: (500) |

5t7ly7z5

5t7ly7z51#

试试这个:

mask = df['Data'].str.startswith('Account')
df['Account'] = df.groupby(mask.cumsum())['Data'].transform('first').mask(mask)
print(df)

# df data like this:

data = [{'Data': 'Account: (663)',
  'CP': 'nan',
  'N0': 'nan',
  'N1': 'nan',
  'ITEM': 'nan',
  'DEBIT': 'nan',
  'CREDIT': 'nan',
  'NET': 'nan',
  'D/C': 'nan',
  'Account': 'nan'},
 {'Data': '31/10/2022',
  'CP': 595.0,
  'N0': 'nan',
  'N1': 12.0,
  'ITEM': 'ITEM DESCRIPTION 4859',
  'DEBIT': '5.564,40',
  'CREDIT': 'nan',
  'NET': '59.786,28',
  'D/C': 'C',
  'Account': 'Account: (663)'},
 {'Data': 'Account: (664)',
  'CP': 'nan',
  'N0': 'nan',
  'N1': 'nan',
  'ITEM': 'nan',
  'DEBIT': 'nan',
  'CREDIT': 'nan',
  'NET': 'nan',
  'D/C': 'nan',
  'Account': 'nan'},
 {'Data': '31/10/2022',
  'CP': 596.0,
  'N0': 'nan',
  'N1': 12.0,
  'ITEM': 'ITEM DESCRIPTION 234243',
  'DEBIT': '3.475,34',
  'CREDIT': 'nan',
  'NET': '15.492,41',
  'D/C': 'D',
  'Account': 'Account: (664)'},
 {'Data': '31/10/2022',
  'CP': 103.0,
  'N0': 'nan',
  'N1': 14.0,
  'ITEM': 'ITEM DESCRIPTION 456456',
  'DEBIT': 'nan',
  'CREDIT': '0,01',
  'NET': '15.492,40',
  'D/C': 'C',
  'Account': 'Account: (664)'},
 {'Data': 'Account: (678)',
  'CP': 'nan',
  'N0': 'nan',
  'N1': 'nan',
  'ITEM': 'nan',
  'DEBIT': 'nan',
  'CREDIT': 'nan',
  'NET': 'nan',
  'D/C': 'nan',
  'Account': 'nan'},
 {'Data': '31/10/2022',
  'CP': 597.0,
  'N0': 'nan',
  'N1': 12.0,
  'ITEM': 'ITEM DESCRIPTION 2332',
  'DEBIT': '6.555,27',
  'CREDIT': 'nan',
  'NET': '71.503,39',
  'D/C': 'C',
  'Account': 'Account: (678)'},
 {'Data': 'Account: (689)',
  'CP': 'nan',
  'N0': 'nan',
  'N1': 'nan',
  'ITEM': 'nan',
  'DEBIT': 'nan',
  'CREDIT': 'nan',
  'NET': 'nan',
  'D/C': 'nan',
  'Account': 'nan'},
 {'Data': '31/10/2022',
  'CP': 608.0,
  'N0': 'nan',
  'N1': 13.0,
  'ITEM': 'ITEM DESCRIPTION 66546',
  'DEBIT': '266.516,00',
  'CREDIT': 'nan',
  'NET': '504.013,87',
  'D/C': 'D',
  'Account': 'Account: (689)'},
 {'Data': '31/10/2022',
  'CP': 608.0,
  'N0': 'nan',
  'N1': 13.0,
  'ITEM': 'ITEM DESCRIPTION 57567',
  'DEBIT': '5.578,67',
  'CREDIT': 'nan',
  'NET': '7.656.192,54',
  'D/C': 'D',
  'Account': 'Account: (689)'},
 {'Data': 'Account: (500)',
  'CP': 'nan',
  'N0': 'nan',
  'N1': 'nan',
  'ITEM': 'nan',
  'DEBIT': 'nan',
  'CREDIT': 'nan',
  'NET': 'nan',
  'D/C': 'nan',
  'Account': 'nan'},
 {'Data': '31/10/2022',
  'CP': 608.0,
  'N0': 'nan',
  'N1': 13.0,
  'ITEM': 'ITEM DESCRIPTION 345345',
  'DEBIT': '54.405,00',
  'CREDIT': 'nan',
  'NET': '645.175,00',
  'D/C': 'D',
  'Account': 'Account: (500)'}]

df = pd.DataFrame(data).replace('nan', float('nan'))
vyu0f0g1

vyu0f0g12#

这不是最“复杂”的方法,但因为找到最有效的方法在这里可能并不重要,所以应该可以很好地工作:

data_col = df['Data']

def find_last_acct_entry(date):
    idx = data_col.index(date)
    ret = idx
    while not data_col[ret].startswith('Account') and ret >= 1:
        ret -= 1
    return data_col[ret]

for idx, row in df.iterrows():
    if not row['Data'].startswith('Account'):
        acct = find_last_acct_entry(row['Data'])
        df.loc[idx, 'Account'] = acct

相关问题