pandas 从包含电影名称字符串的列中提取年份

y53ybaqx  于 2023-01-15  发布在  其他
关注(0)|答案(2)|浏览(135)

我有以下数据,在train_df表中有两列,“标题名称”和“总收入”:

gross       title name
760507625.0 Avatar (2009)
658672302.0 Titanic (1997)
652270625.0 Jurassic World (2015)
623357910.0 The Avengers (2012)
534858444.0 The Dark Knight (2008)
532177324.0 Rogue One (2016)
474544677.0 Star Wars: Episode I - The Phantom Menace (1999)
459005868.0 Avengers: Age of Ultron (2015)
448139099.0 The Dark Knight Rises (2012)
436471036.0 Shrek 2 (2004)
424668047.0 The Hunger Games: Catching Fire (2013)
423315812.0 Pirates of the Caribbean: Dead Man's Chest (2006)
415004880.0 Toy Story 3 (2010)
409013994.0 Iron Man 3 (2013)
408084349.0 Captain America: Civil War (2016)
408010692.0 The Hunger Games (2012)
403706375.0 Spider-Man (2002)
402453882.0 Jurassic Park (1993)
402111870.0 Transformers: Revenge of the Fallen (2009)
400738009.0 Frozen (2013)
381011219.0 Harry Potter and the Deathly Hallows: Part 2 (2011)
380843261.0 Finding Nemo (2003)
380262555.0 Star Wars: Episode III - Revenge of the Sith (2005)
373585825.0 Spider-Man 2 (2004)
370782930.0 The Passion of the Christ (2004)

我想从“标题名称”中删除日期。输出应如下所示:

gross   title name
760507625.0 Avatar
658672302.0 Titanic
652270625.0 Jurassic World
623357910.0 The Avengers
534858444.0 The Dark Knight

忽略毛列,因为它不需要更改。

m528fe3b

m528fe3b1#

使用str.replace,我们可以尝试:

train_df["title name"] = train_df["title name"].str.replace(r'\s+\(\d{4}\)$', '', regex=True)
kr98yfug

kr98yfug2#

另一种解决方案,不使用re,仅使用.str.rsplit()

df['title name'] = df['title name'].str.rsplit(' (', n=1).str[0]
print(df)

图纸:

gross                                    title name
0   760507625.0                                        Avatar
1   658672302.0                                       Titanic
2   652270625.0                                Jurassic World
3   623357910.0                                  The Avengers
4   534858444.0                               The Dark Knight
5   532177324.0                                     Rogue One
6   474544677.0     Star Wars: Episode I - The Phantom Menace
7   459005868.0                       Avengers: Age of Ultron
8   448139099.0                         The Dark Knight Rises
9   436471036.0                                       Shrek 2
10  424668047.0               The Hunger Games: Catching Fire
11  423315812.0    Pirates of the Caribbean: Dead Man's Chest
12  415004880.0                                   Toy Story 3
13  409013994.0                                    Iron Man 3
14  408084349.0                    Captain America: Civil War
15  408010692.0                              The Hunger Games
16  403706375.0                                    Spider-Man
17  402453882.0                                 Jurassic Park
18  402111870.0           Transformers: Revenge of the Fallen
19  400738009.0                                        Frozen
20  381011219.0  Harry Potter and the Deathly Hallows: Part 2
21  380843261.0                                  Finding Nemo
22  380262555.0  Star Wars: Episode III - Revenge of the Sith
23  373585825.0                                  Spider-Man 2
24  370782930.0                     The Passion of the Christ

相关问题