pandas 如何正确地按分隔符分列?

ut6juiuv  于 2023-03-11  发布在  其他
关注(0)|答案(1)|浏览(160)

我必须使用分隔符-拆分列game

df: 
                                                 game                               home_team                       away_team
0                         Bordj Menail – Hamra Annaba                            Bordj Menail                    Hamra Annaba
1                                  CA Batna – US Souf                                CA Batna                         US Souf
2                                     Eulma – Ouargla                                   Eulma                         Ouargla
1860                            Bella Vista – Miramar                             Bella Vista                         Miramar
1861                 U.A.N.L.- Tigres W – Club Leon W                      U.A.N.L.- Tigres W                     Club Leon W
1862                               Queretaro – Toluca                               Queretaro                          Toluca
0                           Sport Recife - Imperatriz               Sport Recife - Imperatriz                            None
1                                    ABC - America RN                        ABC - America RN                            None
2                           Frei Paulistano - Nautico               Frei Paulistano - Nautico                            None
3                             Botafogo PB - Confianca                 Botafogo PB - Confianca                            None

我在努力

df[team_cols] = df['game'].str.split(' – ', expand=True, n=1)

但我只能做到上述部分
当我通过excel查看它时,我可以看到分隔符“出现”不同
例如:

Sport Recife â Sport Recife ## Here delimiter is a special character?
Bordj Menail – Hamra Annaba

我如何拆分这些值?这种行为是什么?

3pvhb19x

3pvhb19x1#

我不明白你的意思,但我会这么做

import pandas as pd

data = {
    'game': [
        'Bordj Menail – Hamra Annaba',
        'CA Batna – US Souf',
        'Eulma – Ouargla',
        'Bella Vista – Miramar',
        'U.A.N.L.- Tigres W – Club Leon W',
        'Queretaro – Toluca',
        'Sport Recife - Imperatriz',
        'ABC - America RN',
        'Frei Paulistano - Nautico',
        'Botafogo PB - Confianca'
    ]
}

df = pd.DataFrame(data)

# Split the game column
pattern = r'\s*[-–â]\s*'
team_cols = ['home_team', 'away_team']
df[team_cols] = df['game'].str.split(pattern, expand=True, n=1)

# Print the result
print(df)

它给出了

game        home_team               away_team
0       Bordj Menail – Hamra Annaba     Bordj Menail            Hamra Annaba
1                CA Batna – US Souf         CA Batna                 US Souf
2                   Eulma – Ouargla            Eulma                 Ouargla
3             Bella Vista – Miramar      Bella Vista                 Miramar
4  U.A.N.L.- Tigres W – Club Leon W         U.A.N.L.  Tigres W – Club Leon W
5                Queretaro – Toluca        Queretaro                  Toluca
6         Sport Recife - Imperatriz     Sport Recife              Imperatriz
7                  ABC - America RN              ABC              America RN
8         Frei Paulistano - Nautico  Frei Paulistano                 Nautico
9           Botafogo PB - Confianca      Botafogo PB               Confianca

相关问题