此问题在此处已有答案:
extract multiple two consecutive letters from a string using regex in python(3个答案)
regex for state abbreviations (python)(2个答案)
11天前关闭。
我需要从full_address列中提取最后两个州的缩写
id position name score \
0 1 19 PJ Fresh (224 Daniel Payne Drive) NaN
1 2 9 J' ti`'z Smoothie-N-Coffee Bar NaN
2 3 6 Philly Fresh Cheesesteaks (541-B Graymont Ave) NaN
3 4 17 Papa Murphy's (1580 Montgomery Highway) NaN
4 5 162 Nelson Brothers Cafe (17th St N) 4.7
ratings category price_range \
0 NaN Burgers, American, Sandwiches $
1 NaN Coffee and Tea, Breakfast and Brunch, Bubble Tea NaN
2 NaN American, Cheesesteak, Sandwiches, Alcohol $
3 NaN Pizza $
4 22.0 Breakfast and Brunch, Burgers, Sandwiches NaN
full_address zip_code lat \
0 224 Daniel Payne Drive, Birmingham, AL, 35207 35207 33.562365
1 1521 Pinson Valley Parkway, Birmingham, AL, 35217 35217 33.583640
2 541-B Graymont Ave, Birmingham, AL, 35204 35204 33.509800
3 1580 Montgomery Highway, Hoover, AL, 35226 35226 33.404439
4 314 17th St N, Birmingham, AL, 35203 35203 33.514730
states = [ 'AK', 'AL', 'AR', 'AZ', 'CA', 'CO', 'CT', 'DC', 'DE', 'FL', 'GA',
'HI', 'IA', 'ID', 'IL', 'IN', 'KS', 'KY', 'LA', 'MA', 'MD', 'ME',
'MI', 'MN', 'MO', 'MS', 'MT', 'NC', 'ND', 'NE', 'NH', 'NJ', 'NM',
'NV', 'NY', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX',
'UT', 'VA', 'VT', 'WA', 'WI', 'WV', 'WY']
df['state']=df['full_address'].apply(lambda x: x if x in states else 'N/A')
但我在列状态中得到N/A
state
0 N/A
1 N/A
2 N/A
3 N/A
4 N/A
5 N/A
6 N/A
7 N/A
8 N/A
9 N/A
型
如何获得列状态中的州缩写的正确值?
我的目标是:状态
0 AL
1 AL
2 AL
3 AL
4 AL
5 AL
型
2条答案
按热度按时间n7taea2i1#
你可以这样做,不需要正则表达式:
字符串
或者这个:
型
rlcwz9us2#
假定它看起来像完整的地址是自由形式的文本,我将避开试图解析它,因为一些错误肯定会出现,只是因为不规则的输入。
我会使用类似https://pypi.org/project/uszipcode/的东西,并利用zipcode字段来确定州缩写