替换Pandas数据列中的部分字符串,替换不起作用

fcg9iug3  于 2023-04-28  发布在  其他
关注(0)|答案(1)|浏览(126)

我一直试图通过删除一部分文本来清理我的数据列。不幸的是,我无法理解它。
我试着使用。pandas系列中的replace方法,但似乎没有工作

df['Salary Estimate'].str.replace(' (Glassdoor est.)', '',regex=True)

0       $53K-$91K (Glassdoor est.)
1      $63K-$112K (Glassdoor est.)
2       $80K-$90K (Glassdoor est.)
3       $56K-$97K (Glassdoor est.)
4      $86K-$143K (Glassdoor est.)
                  ...             
922                             -1
925                             -1
928    $59K-$125K (Glassdoor est.)
945    $80K-$142K (Glassdoor est.)
948    $62K-$113K (Glassdoor est.)
Name: Salary Estimate, Length: 600, dtype: object

我所期待的是

0       $53K-$91K
1      $63K-$112K
2       $80K-$90K
3       $56K-$97K
4      $86K-$143K
                  ...             
922                             -1
925                             -1
928    $59K-$125K
945    $80K-$142K
948    $62K-$113K
Name: Salary Estimate, Length: 600, dtype: object`
iibxawm4

iibxawm41#

如果启用正则表达式,则必须转义正则表达式符号,如().

import re

>>> df['Salary Estimate'].str.replace(re.escape(r' (Glassdoor est.)'), '',regex=True)
0     $53K-$91K
1    $63K-$112K
2     $80K-$90K
3     $56K-$97K
4    $86K-$143K
Name: Salary Estimate, dtype: object

# Or without import re module
>>> df['Salary Estimate'].str.replace(r' \(Glassdoor est\.\)', '',regex=True)
0     $53K-$91K
1    $63K-$112K
2     $80K-$90K
3     $56K-$97K
4    $86K-$143K
Name: Salary Estimate, dtype: object

您还可以提取数字:

>>> df['Salary Estimate'].str.extract(r'\$(?P<min>\d+)K-\$(?P<max>\d+)K')
  min  max
0  53   91
1  63  112
2  80   90
3  56   97
4  86  143

相关问题