regex 从列输出中的字符串提取两个字母的州缩写到新列[重复]

rur96b6h  于 2023-08-08  发布在  其他
关注(0)|答案(2)|浏览(86)

此问题在此处已有答案

extract multiple two consecutive letters from a string using regex in python(3个答案)
regex for state abbreviations (python)(2个答案)
11天前关闭。
我需要从full_address列中提取最后两个州的缩写

id  position                                            name  score  \
0   1        19               PJ Fresh (224 Daniel Payne Drive)    NaN   
1   2         9                  J' ti`'z Smoothie-N-Coffee Bar    NaN   
2   3         6  Philly Fresh Cheesesteaks (541-B Graymont Ave)    NaN   
3   4        17         Papa Murphy's (1580 Montgomery Highway)    NaN   
4   5       162                Nelson Brothers Cafe (17th St N)    4.7   

   ratings                                          category price_range  \
0      NaN                     Burgers, American, Sandwiches           $   
1      NaN  Coffee and Tea, Breakfast and Brunch, Bubble Tea         NaN   
2      NaN        American, Cheesesteak, Sandwiches, Alcohol           $   
3      NaN                                             Pizza           $   
4     22.0         Breakfast and Brunch, Burgers, Sandwiches         NaN   

                                        full_address zip_code        lat  \
0      224 Daniel Payne Drive, Birmingham, AL, 35207    35207  33.562365   
1  1521 Pinson Valley Parkway, Birmingham, AL, 35217    35217  33.583640   
2          541-B Graymont Ave, Birmingham, AL, 35204    35204  33.509800   
3         1580 Montgomery Highway, Hoover, AL, 35226    35226  33.404439   
4               314 17th St N, Birmingham, AL, 35203    35203  33.514730
states = [ 'AK', 'AL', 'AR', 'AZ', 'CA', 'CO', 'CT', 'DC', 'DE', 'FL', 'GA',
           'HI', 'IA', 'ID', 'IL', 'IN', 'KS', 'KY', 'LA', 'MA', 'MD', 'ME',
           'MI', 'MN', 'MO', 'MS', 'MT', 'NC', 'ND', 'NE', 'NH', 'NJ', 'NM',
           'NV', 'NY', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX',
           'UT', 'VA', 'VT', 'WA', 'WI', 'WV', 'WY']
df['state']=df['full_address'].apply(lambda x: x if x in states  else 'N/A')

但我在列状态中得到N/A

state  
0  N/A  
1 N/A  
2 N/A  
3 N/A  
4 N/A  
5 N/A  
6 N/A  
7 N/A  
8 N/A  
9 N/A


如何获得列状态中的州缩写的正确值?
我的目标是:状态

0  AL 
1 AL  
2 AL  
3 AL  
4 AL 
5 AL

n7taea2i

n7taea2i1#

你可以这样做,不需要正则表达式:

df["State"] = df["full_address"].str.split(", ").str[-2]

字符串
或者这个:

df["full_address"].str.extract(r"(?<=, )(\w{2})(?=, )")

rlcwz9us

rlcwz9us2#

假定它看起来像完整的地址是自由形式的文本,我将避开试图解析它,因为一些错误肯定会出现,只是因为不规则的输入。
我会使用类似https://pypi.org/project/uszipcode/的东西,并利用zipcode字段来确定州缩写

相关问题