python 从字符串中提取前6位或前7位数字,这些数字也有逗号[重复]

p8h8hvxi  于 2023-01-08  发布在  Python
关注(0)|答案(3)|浏览(276)
    • 此问题在此处已有答案**:

How can I remove all non-numeric characters from all the values in a particular column in pandas dataframe?(6个答案)
4小时前关门了。
我期待提取第一个6或7位数字。数字有逗号以及

**Salary**

60000083 annually  
172829 annually  
2,50,000 annually  
2,02,000 annually  
27,00,000 annually

并且我正在寻找以下输出(有2列-1)Salary 2)6_or_7 Digit

**Salary**               **6_or_7 Digit**  

60000083 annually        6000008  
172829 annually          172829  
2,50,000 annually        250000
2,02,000 annually        202000  
27,00,000 annually       2700000

我在努力

Test['6_or_7 Digit'] = Test['Salary'].apply(lambda x: re.findall('[0-9]{1,6}', x)[0] if re.findall('[0-9]{1,6}', x) else '0').str.zfill(6)

以上仅对前两种情况提取6位数,不适用于带逗号的数字(2,50,000,2,02,000,27,00,000)

d7v8vwbk

d7v8vwbk1#

这个怎么样?
正如@Zachaeus提到的,删除逗号并使用regex提取数字。

df['Salary'].str.replace(",", '').str.extract("(\d{6,7})")
0
0  6000008
1   172829
2   250000
3   202000
4  2700000
vngu2lb8

vngu2lb82#

去掉逗号,提取出开头的数字,我想你不需要担心用它来计算数字。

Test['6_or_7 Digit'] = Test['Salary'].replace(',','',regex=True).str.extract('(\d+)')

**测试 Dataframe **

Salary 6_or_7 Digit
0   60000083 annually     60000083
1     172829 annually       172829
2   2,50,000 annually       250000
3   2,02,000 annually       202000
4  27,00,000 annually      2700000
acruukt9

acruukt93#

因此,使用基本文本函数:

LEFT(SUBSTITUTE(A1,",",""),FIND(" ",SUBSTITUTE(A1,",",""),1)-1)*1

相关问题