合并具有相同值的pandas多索引列

iswrvxsc  于 2023-05-05  发布在  其他
关注(0)|答案(1)|浏览(149)

我正在使用Python进行文档提取,似乎遇到了问题。
The problem
在将pandas数据框转换为HTML之后,我注意到数据框multiindex上有一个多索引列。
我想合并“名称”、“测量点”和“测试项目”,同时仍保持“标准”不变
我尝试使用

df.columns.map('|'.join).str.strip('|')

但它不会删除重复值。
我想让它像这样expected result,但我得到的是this result

a2mppw5e

a2mppw5e1#

f-strings使用列表解析:

mux = pd.MultiIndex.from_tuples([('Name','Name'),
                                 ('Measurement Point','Measurement Point'),
                                 ('Test Item','Test Item'),
                                 ('Criterion','Min')])
df = pd.DataFrame(index=[0], columns=mux)
print(df)
  Name Measurement Point Test Item Criterion
  Name Measurement Point Test Item       Min
0  NaN               NaN       NaN       NaN

df.columns = [f'{a}|{b}' if a!=b else a for a, b in df.columns]

print(df)
  Name Measurement Point Test Item Criterion|Min
0  NaN               NaN       NaN           NaN

编辑:如果需要组合MultiIndex和Index,则不可能。
如果两个级别中的值相同,则可以将其替换为空字符串:

mux = pd.MultiIndex.from_tuples([('Name','Name'),
                                 ('Measurement Point','Measurement Point'),
                                 ('Test Item','Test Item'),
                                 ('Criterion','Min')])
df = pd.DataFrame(index=[0], columns=mux)
print(df)
  Name Measurement Point Test Item Criterion
  Name Measurement Point Test Item       Min
0  NaN               NaN       NaN       NaN
df.columns = pd.MultiIndex.from_tuples([(a,b) if a!=b else (a,'') for a, b in df.columns])

#seems looks OK   
print(df)
  Name Measurement Point Test Item Criterion
                                         Min
0  NaN               NaN       NaN       NaN

#real MultiIndex values
print(df.columns)
MultiIndex([(             'Name',    ''),
            ('Measurement Point',    ''),
            (        'Test Item',    ''),
            (        'Criterion', 'Min')],
           )

另一个想法是通过set_index手动将列名设置为MultiIndex:

df = df.set_index([('Name','Name'),
                   ('Measurement Point','Measurement Point'),
                  ('Test Item','Test Item')])

df.index.names = [list(x)[0] for x in df.index.names]

print(df)
                                 Criterion
                                       Min
Name Measurement Point Test Item          
NaN  NaN               NaN             NaN
print(df.columns)
MultiIndex([('Criterion', 'Min')],
           )
             
print(df.index)
MultiIndex([(nan, nan, nan)],
           names=['Name', 'Measurement Point', 'Test Item'])

相关问题