pandas 如何使用从列表中筛选字符串值的方法在 Dataframe 中创建列?

zengzsys  于 2023-02-02  发布在  其他
关注(0)|答案(2)|浏览(99)

我有以下格式的 Dataframe (实际 Dataframe 包含10000多行)

Occupation                  Education
Engineer                    High School    
Neurosurgeon                Masters
Electrical Engineer         Masters
Mechanical Engineer         Masters
Software Engineer           Masters
Engineer                    Masters
Business Executive          Masters
Sales Executive             Bachelors
Neurosurgeon                Masters
Electrical Engineer
Accountant                  Bachelors
Sales Executive             Masters

我想添加基于选择性筛选的列
我需要我的结果是这样的

Occupation                  Education               Welfare_Cost
Engineer                    High School             50 
Neurosurgeon                Masters                 50
Electrical Engineer         Masters                 100
Mechanical Engineer         Masters                 100
Software Engineer           Masters                 100
Engineer                    Masters                 100
Business Executive          Masters                 100
Sales Executive             Bachelors               50
Neurosurgeon                Masters                 50
Electrical Engineer                                 50
Accountant                  Bachelors               50 
Sales Executive             Masters                 100

我只想处理职业包含列表中的字符串和Education is Masters的行,我尝试使用下面的代码来实现这一点,但一直得到错误。

lis=['Engineer','Executive','Teacher']

df['Welfare_Cost']=np.where(((df['Education']=='Masters')&
                        (df['Occupation'].str.contains(i for i in lis))),        
                      100,50)

我知道我也可以通过运行迭代循环来为每一行创建一个列表,并将该列表添加为一列,但我有许多列表组合,因此我正在寻找一种不使用迭代循环就能完成此操作的方法。

332nm8kg

332nm8kg1#

join\b\b一起用于单词边界,将|用于正则表达式or

lis=['Engineer','Executive','Teacher']

pat = '|'.join(r"\b{}\b".format(x) for x in lis)

df['Welfare_Cost'] = np.where(((df['Education']=='Masters') & 
                              (df['Occupation'].str.contains(pat))),
                              100,50)

或者:

df['Welfare_Cost'] = np.where(((df['Education']=='Masters') & 
                              (df['Occupation'].str.contains('|'.join(lis)))),
                              100,50)

print (df)
             Occupation  Education  Welfare_Cost
0         Engineer High     School            50
1          Neurosurgeon    Masters            50
2   Electrical Engineer    Masters           100
3   Mechanical Engineer    Masters           100
4     Software Engineer    Masters           100
5              Engineer    Masters           100
6    Business Executive    Masters           100
7       Sales Executive  Bachelors            50
8          Neurosurgeon    Masters            50
9   Electrical Engineer        NaN            50
10           Accountant  Bachelors            50
11      Sales Executive    Masters           100

可能存在差异,参见变更数据-\b\b匹配字符串,无子字符串:

lis=['Engineer','Executive','Teacher']

df['Welfare_Cost1'] = np.where(((df['Education']=='Masters') & 
                              (df['Occupation'].str.contains('|'.join(lis)))),
                              100,50)

pat = '|'.join(r"\b{}\b".format(x) for x in lis)

df['Welfare_Cost2'] = np.where(((df['Education']=='Masters') & 
                              (df['Occupation'].str.contains(pat))),
                              100,50)

print (df)
              Occupation  Education  Welfare_Cost1  Welfare_Cost2
0          Engineer High     School             50             50
1           Neurosurgeon    Masters             50             50
2   Electrical Engineers    Masters            100             50
3    Mechanical Engineer    Masters            100            100
4      Software Engineer    Masters            100            100
5               Engineer    Masters            100            100
6     Business Executive    Masters            100            100
7        Sales Executive  Bachelors             50             50
8           Neurosurgeon    Masters             50             50
9    Electrical Engineer        NaN             50             50
10            Accountant  Bachelors             50             50
11      Sales Executives    Masters            100             50
lymnna71

lymnna712#

在您的示例中,过滤器列表只包含职业名称的基本部分(语义上),因此检查str.endswith就足够了:

df['Welfare_Cost']=np.where((df['Education']=='Masters') & df['Occupation'].str.endswith(tuple(lis)),100,50)
Occupation    Education  Welfare_Cost
0              Engineer  High School            50
1          Neurosurgeon      Masters            50
2   Electrical Engineer      Masters           100
3   Mechanical Engineer      Masters           100
4     Software Engineer      Masters           100
5              Engineer      Masters           100
6    Business Executive      Masters           100
7       Sales Executive    Bachelors            50
8          Neurosurgeon      Masters            50
9   Electrical Engineer         None            50
10           Accountant    Bachelors            50
11      Sales Executive      Masters           100

相关问题