如何在pandas中合并dataframe中的列?

pzfprimi  于 2023-05-05  发布在  其他
关注(0)|答案(3)|浏览(153)

我有一个数据框

df = pd.DataFrame({
    'name_1': ['Juan', '', ''],
    'name_2': ['', 'Pedro', ''],
    'name_3': ['', '', 'Ana'],
    'l_name': ['García', 'Sánchez', 'Hernández'],
    'profession_4': ['Doctor', 'Doctor', ''], 
    'profession_5': ['', '', 'architect'],
    'hobbie_6': ['Dance', '', 'Music'],
    'hobbie_7': ['', 'Music', 'Paint'],
    'hobbie_8': ['', '', 'Dance'],
})
df

其中有同名的列,所以我想做的是合并所有这些重复的列,只创建一个列,它看起来像这样:

为此,生成以下代码:

# Group the columns by their name before the underscore
grouped_columns = df.columns.to_series().groupby(lambda x: x.rsplit('_', 1)[0]).apply(list).tolist()

# Iterate through each group of columns and combine them
for columns in grouped_columns:
    # Get the name of the group
    group_name = columns[0].rsplit('_', 1)[0]
    # Combine the columns into a new column with the name of the group
    df[group_name + '_combined'] = pd.concat([df[column] for column in columns], axis=1).apply(lambda x: '/'.join(x.dropna().astype(str)), axis=1)
    
# Drop the original columns
df.drop(df.filter(regex='_\d+$').columns, axis=1, inplace=True)

# Display the resulting DataFrame
df

但是我得到了这个表,它会乱打印我,而且没有数据的地方也会显示/

我如何改进我的代码,使它按顺序打印表格,而不需要不必要的/,也就是说,如果有一个空单元格需要与一个完整的单元格组合,在新列中只有现有的值,而不是空的值/?

fxnxkyjh

fxnxkyjh1#

您可以标准化您的列名,删除后缀(使用str.replace),然后在列上执行groupby.agg

(df.groupby(df.columns.str.replace(r'_\d+$', '', regex=True), axis=1, sort=False)
   .agg(lambda g: g.apply(lambda x: '/'.join(x[x.ne('')]), axis=1))
)

或者:

(df.set_axis(df.columns.str.replace(r'_\d+$', '', regex=True), axis=1)
   .replace('', np.nan).stack()
   .groupby(level=[0,1], sort=False).agg('/'.join)
   .unstack()
)

输出:

name     l_name profession             hobbie
0   Juan     García     Doctor              Dance
1  Pedro    Sánchez     Doctor              Music
2    Ana  Hernández  architect  Music/Paint/Dance
6pp0gazn

6pp0gazn2#

groupby的另一个选项(mozway的路由更短,性能应该更高):

(df
.groupby(df.columns.str.split('_').str[0], axis = 1)
.agg(lambda f: f.iloc[:, 0].str.cat(f.iloc[:, 1:], sep = '/').str.strip('/'))
.rename(columns = lambda col: f"{col}_combined" if not col == "l" else f"{col}_name")
)
     hobbie_combined     l_name name_combined profession_combined
0              Dance     García          Juan              Doctor
1              Music    Sánchez         Pedro              Doctor
2  Music/Paint/Dance  Hernández           Ana           architect

因为这本质上是一个字符串连接,所以你可以使用for循环来获得更好的性能:

out = df.groupby(df.columns.str.split('_').str[0], axis = 1)
contents = {}

for key, frame in out:
    value = frame.apply("/".join, axis = 1).str.strip("/")
    if key == 'l':
        contents["l_name"] = value
    else:
        contents[f"{key}_combined"] = value
pd.DataFrame(contents)
     hobbie_combined     l_name name_combined profession_combined
0              Dance     García          Juan              Doctor
1              Music    Sánchez         Pedro              Doctor
2  Music/Paint/Dance  Hernández           Ana           architect

您可以在apply空间内进一步转储到普通的python -仅在需要时进行优化。
你也可以使用MultiIndex和for循环

nimxete2

nimxete23#

可能的解决方案:

out = df.replace('', pd.NA).bfill(axis=1)
out = pd.concat([
    out['name_1'], out['l_name'], out['profession_4'],
    out.filter(like='hobbie_').apply(
        lambda x: '/'.join(x.dropna().drop_duplicates()), axis=1)], axis=1)
out.columns = ['name_combined', 'l_name', 'profession_combined', 'hobbie_combined']

另一种可能的解决方案:

(df.assign(id = df.index).melt('id')
 .mask(lambda x: x['value'].eq('')).dropna()
 .assign(variable = lambda x: x['variable']
         .str.replace(r'_\d', '_combined', regex=True))
 .pivot_table(index='id', columns='variable', 
              values='value', aggfunc=lambda x: '/'.join(x), sort=False)
 .reset_index(drop=True))

输出:

variable name_combined     l_name profession_combined    hobbie_combined
0                 Juan     García              Doctor              Dance
1                Pedro    Sánchez              Doctor              Music
2                  Ana  Hernández           architect  Music/Paint/Dance

相关问题