选择名称重复的列pandas,如“列'a'加上名称出现多次的任何列”

utugiqy6  于 11个月前  发布在  其他
关注(0)|答案(3)|浏览(136)

我想选择某些列,即使重复,同时保持相同的名称

col_select = ["a","x","x","x"]
   a  x  x  x  z
0  6  2  7  7  8
1  6  6  3  1  1
2  6  6  7  5  6
3  8  3  6  1  8
4  5  7  5  3  0

字符串
期望输出

a  x  x  x 
0  6  2  7  5
1  6  6  3  1
2  6  6  7  5
3  8  3  6  1
4  5  7  5  3
df[col_select]
boolen = []
col_commun = []
for i in range(0,len(col_select)):
    #print(i)
    boolen.append(col_select[i] in df.columns)
    if boolen[i] == True:
        col_commun.append(col_select[i])
        
df_out= df.loc[:,col_commun]
uyhoqukh

uyhoqukh1#

你可以做

out = df.loc[:,df.columns.duplicated(keep = False) | df.columns.isin(['a'])]
                                                       
   a  x  x  x
0  6  2  7  7
1  6  6  3  1
2  6  6  7  5
3  8  3  6  1
4  5  7  5  3

字符串

to94eoyn

to94eoyn2#

您可以使用this question中的模式来识别列名中的重复项。然后只需将任何您想要保留的非重复列添加到该列表中,并像往常一样选择:

from collections import Counter

duplicated_cols = [col for col, count in Counter(df.columns).items() if count > 1]
df_out = df.loc[:, ["a"] + duplicated_cols]

字符串

vfh0ocws

vfh0ocws3#

验证码

drop唯一列,除“a”

uniq_cols = df.columns.drop_duplicates(keep=False)
out = df.drop(uniq_cols.drop('a'), axis=1)

字符串
输出:

a   x   x   x
0   6   2   7   7
1   6   6   3   1
2   6   6   7   5
3   8   3   6   1
4   5   7   5   3

示例代码

import pandas as pd
data1 = [[6, 2, 7, 7, 8], [6, 6, 3, 1, 1], [6, 6, 7, 5, 6], 
         [8, 3, 6, 1, 8], [5, 7, 5, 3, 0]]
df = pd.DataFrame(data1, columns=['a', 'x', 'x', 'x', 'z'])

相关问题