pandas 如何将多个值替换为新列中的一个值?

qmelpv7a  于 2023-03-16  发布在  其他
关注(0)|答案(2)|浏览(88)

在我当前正在处理的数据集中有一个名为Ground的列。该列中的值是世界各地不同板球场的名称。我想创建一个新列,其中的值必须是板球场所在的国家名称。下面是场地名称列表。

array(['Auckland', 'Southampton', 'Johannesburg', 'Brisbane', 'Bristol',
       'Khulna', 'Wellington', 'Sydney', 'The Oval', 'Nairobi (Gym)',
       'Durban', 'Cape Town', 'Brabourne', 'Perth', 'Gqeberha',
       'Melbourne', 'Christchurch', 'Karachi', 'Manchester', 'Bridgetown',
       'Belfast', 'King City (NW)', 'Hamilton', 'Colombo (RPS)',
       'Port of Spain', 'Centurion', 'Dubai (DSC)', "Lord's",
       'Nottingham', 'Basseterre', 'Nagpur', 'Mohali', 'Colombo (PSS)',
       'Abu Dhabi', 'Hobart', 'Providence', 'Gros Islet', 'North Sound',
       'Lauderhill', 'Harare', 'Birmingham', 'Cardiff', 'Bloemfontein',
       'Kimberley', 'Adelaide', 'Pallekele', 'Mirpur', 'Eden Gardens',
       'Mombasa', 'ICCA Dubai', 'Hambantota', 'The Hague',
       'Chester-le-Street', 'Chennai', 'Pune', 'Wankhede', 'East London',
       'Bengaluru', 'Ahmedabad', 'Sharjah', 'Windhoek', 'Bulawayo',
       'Aberdeen', 'Kingstown', 'Rajkot', 'Chattogram', 'Kingston',
       'Sylhet', 'Roseau', 'Lahore', 'Bready', 'Edinburgh',
       'Dublin (Malahide)', 'Dharamsala', 'Cuttack', 'Mount Maunganui',
       'Ranchi', 'Visakhapatnam', 'Delhi', 'Napier', 'Kanpur', 'Geelong',
       'Greater Noida', 'Taunton', 'Guwahati', 'Potchefstroom',
       'Thiruvananthapuram', 'Indore', 'Nelson', 'Dehradun', 'Rotterdam',
       'Deventer', 'Amstelveen', 'Lucknow', 'Carrara', 'Al Amerat',
       'ICCA 2 Dubai', 'Canberra', 'Hyderabad', "St George's",
       'Rawalpindi', 'Paarl', 'Dunedin', 'Coolidge', 'Leeds', 'Dublin',
       'Jaipur', 'Tarouba'], dtype=object)

我用另一个数据集逐个做了这些。然而,这个数据集是巨大的。

5tmbdcev

5tmbdcev1#

作为起点,您可以使用Wikipedia

dfs = pd.read_html('https://en.wikipedia.org/wiki/List_of_cricket_grounds_by_capacity')
grounds = pd.concat(dfs)

输出:

>>> grounds[['Ground', 'City', 'Country']]

                                               Ground           City      Country
0                               Narendra Modi Stadium      Ahmedabad        India
1                            Melbourne Cricket Ground      Melbourne    Australia
2                                        Eden Gardens        Kolkata        India
3   Shaheed Veer Narayan Singh International Crick...         Raipur        India
4                                       Perth Stadium          Perth    Australia
..                                                ...            ...          ...
10                                     Sheffield Park       Uckfield      England
11                Vidarbha Cricket Association Ground         Nagpur        India
12                       Indira Priyadarshini Stadium  Visakhapatnam        India
13                            Queen Elizabeth II Park   Christchurch  New Zealand
14                                   Hyde Park Ground      Sheffield      England

[201 rows x 3 columns]

您可以Map:

dmap = grounds.drop_duplicates('City').set_index('City')['Country']
df['Country'] = df['Ground'].str.extract('([^(]+)', expand=False).map(dmap)
print(df)

# Output
           Ground       Country
0        Auckland   New Zealand
1     Southampton       England
2    Johannesburg  South Africa
3        Brisbane     Australia
4         Bristol       England
..            ...           ...
103      Coolidge           NaN
104         Leeds       England
105        Dublin       Ireland
106        Jaipur         India
107       Tarouba           NaN

[108 rows x 2 columns]
vu8f3i0k

vu8f3i0k2#

我认为您可以使用df.replace,并对to_replace参数使用一个dict,该参数定义板球场地名称和国家名称之间的Map。

相关问题