pandas 尝试基于列表值和len动态填充列

06odsfpq  于 2023-08-01  发布在  其他
关注(0)|答案(2)|浏览(82)

我一直在尝试尽可能优化一个数据操作,它需要两个部分。我正在使用一个pandas列,其中每个值都是一个不同长度或NaN值的列表,我想提取这些值,并根据它们在列表中的位置将它们放在列中。
以下是一个可复制的示例,以及我目前为止所编码的内容:

df = pd.DataFrame({'first_name': ['Charles', 'Charles2', 'Charles3'],
               'last_name': ['George', 'George2', 'George3'],
               'banking_number': [['NaN'], ['UK421'], ['UK123', 'FR789']]}
             )

个字符
你怎么会得到这样的结果?

df_final = pd.DataFrame({'first_name': ['Charles', 'Charles2', 'Charles3'],
                   'last_name': ['George', 'George2', 'George3'],
                   'banking_number': [['NaN'], ['UK421'], ['UK123', 'FR789']],
                   'bank_1': ['NaN', 'UK', 'UK'],
                   'bank_2': ['NaN', 'NaN', 'FR']}
                 )


谢谢你的时间和帮助

7eumitmz

7eumitmz1#

你可以试试这个:

import pandas as pd

df = pd.DataFrame({'first_name': ['Charles', 'Charles2', 'Charles3'],
               'last_name': ['George', 'George2', 'George3'],
               'banking_number': [[None], ['UK421'], ['UK123', 'FR789']]}
             )

# split the banking_number column
df[['bank_1','bank_2']] = pd.DataFrame(df['banking_number'].tolist(), index= df.index)

# keep only the first 2 characters (maybe not needed but I wanted to match your expected output)
def get_first_2_char(x):
    x=x[0:2] if x else x
    return x

for col in ['bank_1', 'bank_2']:
    df[col] = df[col].apply(lambda x: get_first_2_char(x))

字符串
输出:x1c 0d1x

编辑

在@itprorh86备注之后,下面是一个更新的版本,可以处理不同数量的银行。

import pandas as pd

df = pd.DataFrame({'first_name': ['Charles', 'Charles2', 'Charles3'],
               'last_name': ['George', 'George2', 'George3'],
               'banking_number': [[None], ['UK421'], ['UK123', 'FR789']]}
             )

max_len = max(df['banking_number'].apply(lambda x: len(x)))
column_names = [f'bank_{i}' for i in range(1, max_len+1)]

# split the banking_number column
df[column_names] = pd.DataFrame(df['banking_number'].tolist(), index= df.index)

# keep only the first 2 characters
def get_first_2_char(x):
    x=x[0:2] if x else x
    return x

for col in column_names:
    df[col] = df[col].apply(lambda x: get_first_2_char(x))

dw1jzc5e

dw1jzc5e2#

这里有一个与@JulietteB建议的方法略有不同的方法。

def parseDF(df_in: pd.DataFrame) -> pd.DataFrame:
    key_cols = list(df_in.columns)
    max_banks = max(list(len(x) for x in df_in['banking_number'].to_list()))
    for nk in range(max_banks):
        key_cols.append(f'bank_{nk+1}')
    rslt = []
    for r in range(df.shape[0]):
        row = []
        row.append(df.iloc[r]['first_name'])
        row.append(df.iloc[r]['last_name'])
        b_number = df.iloc[r]['banking_number']
        print(b_number)
        row.append(b_number)
        for nw in range(max_banks):
            if nw < len(b_number):
                print(b_number[nw])
                row.append(b_number[nw])
            else:
                row.append('NaN')
        rslt.append(row)
        print(rslt)
    return pd.DataFrame(data= rslt, columns = key_cols)

字符串
这种方法将提供相同的答案,但具有处理不同数目银行的优点
给定输入数据,执行parseDF(df)将产生:

first_name  last_name   banking_number  bank_1  bank_2
0   Charles George  [NaN]   NaN NaN
1   Charles2    George2 [UK421] UK421   NaN
2   Charles3    George3 [UK123, FR789]  UK123   FR789

相关问题