pandas 使用通配符搜索所有列

67up9zun  于 2023-01-07  发布在  其他
关注(0)|答案(3)|浏览(188)

我有列数可变的数据。我希望能够排除所有包含以"9999"结尾的数字的行。
样品输入:

Customer Reference,Profile Name,Score,Band Text,Result1,Result2,Result3,Result4,Result5,Result6,Result7,Result8,Result9,Result10,Result11,Result12,Result13,Result14,Result15,Result16,Result17,Result18,Result19,Result20,Result21,Result22,Result23,Result24,Result25,Result26,Result27,Result28,Result29,Result30,Result31,Result32,Result33,Result34,Result35,Result36,Result37,Result38,Result39,Result40,Result41,Result42,Result43,Result44,Result45,Result46,Result47,Result48,Result49
038ff126-1ed5-4a96-bb34-3f4b595228d3,UK,1200,PASS - 2+2,155261,155101,155151,155161,155271,155251,1551001,1551101,1557031,1559500,1558101,395102,3953500,3952260,3952100,3952101,3952111,3953800,3953760,3953512,3953522,3956611,3959600,3959601,3963505,3963513,3963531,3963507,1062,2452,1332,,,,,,,,,,,,,,,,,,
87529660,Germany,1111,APPROVED,2289528,401126,401102,401423,401424,401426,4012100,4012101,4012111,4012260,4013500,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
37a52968-8093-41e5-8a2e-6bd251d0666d,UK,2200,PASS - 2+2,155261,155101,155151,155161,1551001,1551101,1551111,1551121,1551071,1551072,1553520,1556518,1557031,395102,3953500,3952260,3952100,3952101,3952111,3953512,3953521,3956760,3956600,3956601,3956611,3963504,3963508,3963512,3963526,3963529,3963500,3964510,1062,2452,1332,,,,,,,,,,,,,,
138629690,Germany,1111,APPROVED,2283524,2283525,2282111,2282100,2282101,2282263,2282264,2282266,2282260,4012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1d45f78b-01c5-4007-8f8c-a9fb845cba1f,UK,1300,PASS - 2+2,155261,155101,155151,155161,155131,1551001,1551011,1551021,1553508,1551101,1551111,1551121,1551071,1551072,1553522,3952,3962,1062,2452,1332,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
a56b590b-b8bd-4e56-987e-f801a37e487d,UK,1300,PASS - 2+2,155261,155101,155151,155161,155131,1551001,1551011,1551021,1553508,1559999,1551111,1551121,1551071,1551072,1553522,1556514,3952,3962,1062,2452,1332,,,,,,,,,,,,,,,,,,,,,,,,,,,,
138888892,Germany,2221,APPROVED,2283525,2282111,2282100,2282101,2284530,2284266,2288263,2288264,2288260,401126,401102,401423,401424,401426,4012100,4012101,4012111,4012260,4013500,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

请注意,不同的文件具有不同数量的"结果"列。
我能找到的唯一有效的方法是这样的:

filter1 = data.Result1.astype(str).str.contains('9999')
filter2 = data.Result2.astype(str).str.contains('9999')
filter3 = data.Result3.astype(str).str.contains('9999')
filter4 = data.Result4.astype(str).str.contains('9999')

然后像这样应用过滤器

data.where(filter1 | filter2 | filter3 | filter4 ... etc

显然,这是痛苦的,我需要改变每个文件的过滤器数量。
我肯定有办法做这个,但我还没找到。有人能帮忙吗?

qnzebej0

qnzebej01#

连接所有Result* 列并搜索该新列

clms_to_concat = [i for i in df.columns if 'Result' in i]
df['combined'] = ''
for c in clms_to_concat:
    df['combined'] = df['combined'] + '_' + df[c].astype(str) 
df[df.combined.str.contains('9999')]
xwbd5t1u

xwbd5t1u2#

我认为您可以使用replacedropna的组合。
here的部分答案:
1.首先,将以9999结尾的所有值替换为np.nan的
1.你放下了NaN
它应该看起来像这样:

df.replace('.9999',np.nan,regex=True).dropna(axis = 0, how = 'any')
mklgxw1f

mklgxw1f3#

以下方法仅考虑标记为Result的列,并过滤掉那些列值以9999结尾的行:

df[~df.filter(like='Result').astype(str).apply(lambda x: x.str[-4:] == '9999').any(1)]
Customer Reference Profile Name  Score   Band Text  ...  Result46  Result47  Result48  Result49
0  038ff126-1ed5-4a96-bb34-3f4b595228d3           UK   1200  PASS - 2+2  ...       NaN       NaN       NaN       NaN
1                              87529660      Germany   1111    APPROVED  ...       NaN       NaN       NaN       NaN
2  37a52968-8093-41e5-8a2e-6bd251d0666d           UK   2200  PASS - 2+2  ...       NaN       NaN       NaN       NaN
3                             138629690      Germany   1111    APPROVED  ...       NaN       NaN       NaN       NaN
4  1d45f78b-01c5-4007-8f8c-a9fb845cba1f           UK   1300  PASS - 2+2  ...       NaN       NaN       NaN       NaN
6                             138888892      Germany   2221    APPROVED  ...       NaN       NaN       NaN       NaN

相关问题