python 如何将旧版Pandas索引代码转换为Pandas 2.0.2?

yeotifhr  于 2023-06-28  发布在  Python
关注(0)|答案(1)|浏览(86)

我试图将一本使用Pandas 1.x的书中的一些代码转换为当前的Pandas,但是函数count中的方法级别似乎已经被弃用了。
下面是代码:

MovieLens.set_index(["title", "rating"]).count(level="rating")["user_id"]
IndUsers = MovieLens.set_index(["movie_id", "user_id"]).count(level="user_id")["title"]
print("Average movie reviews per user: ", IndUsers.mean())

IndMovies = MovieLens.set_index(["user_id", "title"]).count(level="title")["movie_id"]
print("\nNumber of Reviews Per Movie\n")
print(IndMovies)

数据结构(MovieLens)和输出如下:

user_id  movie_id  rating  timestamp gender  age  occupation    zip  \

0 1 1193 5 978300760 F 1 10 48067
1 2 1193 5 978298413 M 56 16 70072
2 12 1193 4 978220179 M 25 12 32793
3 15 1193 4 978199279 M 25 7 22903
4 17 1193 5 978158471 M 50 1 95350
………………………………………………………………………………………………………………………………………………………
1000204 5949 2198 5 958846401 M 18 17 47901
1000205 5675 2703 3 976029116 M 35 14 30030
1000206 5780 2845 1 958153068 M 18 17 92886
1000207 5851 3607 5 957756608 F 18 20 55410
1000208 5938 2909 4 957273353 M 25 1 35401

title                genres

00飞越杜鹃巢一个飞越杜鹃巢(1975)
第10集5.1飞越布谷鸟巢One Flw Over the Cuckoo's Nest(1975)
《飞越杜鹃巢》One Flw Over the Cuckoo's Nest(1975)
03《飞越杜鹃巢》One Flw Over the Cuckoo's Nest(1975)
第40集5.1第一次飞越布谷鸟巢(1975)
………………………………………………
03 00 02 04 03 03 03 03:01 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 04 03 03 03 04 03 03 04 04 03 04 03 04 03 04 03 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 03 04 04 04 04 04 04 04 04 03 04 04 04 04 04 04 04 04 04 04 04 04 03 04 04 04 04 04 04 0
03 00 02 03 03 03 01 03 02 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 04 03 03 03 03 03 04 03 03 03 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
03 00 02 04 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 04 03 03 03 03 04 03 03 03 04 03 03 04 03 04 03 04 03 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04 04
1973年01月03日01:05|戏剧|西部
1000208 Five Wives,Three Secretaries and Me(1998)
[1000209行x 10列]
TypeError Traceback(most recent call last)Cell In [16],line 29 26 MovieLens = pd. merge(pd. merge(ratings,users),movies)28 print(MovieLens)

  • --> 29 MovieLens set_index(["title","rating"])count(level ="rating")30 IndUsers = MovieLens set_index(["movie_id","user_id"])count(level ="user_id")31 #MovieLens set_index(["title","rating"])count(level ="user_id")32 #IndUsers ="MovieLens set_index([" movie_id "," user_id "])count(level =" user_id "])30 IndUsers =" MovieLens set_index([" movie_id"," user_id"])31 #MovieLens set_index([" title"," user_id")32 #IndUsers =" MovieLens set_index([" movie_index"," user_id"])count()=")
    TypeError:DataFrame. count()获得意外的关键字参数“level”
    预期输出(我是用手打出来的,因为复制粘贴被书阻止)平均电影评论每个用户:165.597...
    每部电影的评论数量
    00 0000 Duck(1971)37 [...额外电影] EXistenZ(1999)410
    在Pandas 2.0.2中,我没有看到一个简单的方法来替代它。
  • 代码生成错误->没有级别的代码不区分用户(即它忽略user_id,假设所有用户都相同)-> Pandas 2.0.2中的其他计数选项不提供所需的功能

8ulbf1ek

8ulbf1ek1#

在Pandas 2.0.2中,count方法不再支持level参数。要实现相同的功能,可以同时使用groupby方法和size方法。
以下是如何修改代码以使用Pandas 2.0.2:

import pandas as pd

# Assuming you have already loaded the MovieLens data into the DataFrame 'MovieLens'

# Counting the number of ratings for each rating value
rating_counts = MovieLens.groupby(["title", "rating"]).size()
rating_counts = rating_counts.reset_index(name="count")
rating_counts = rating_counts.set_index("rating")["count"]
print(rating_counts)

# Counting the number of movies reviewed by each user
IndUsers = MovieLens.groupby(["user_id", "movie_id"]).size()
IndUsers = IndUsers.reset_index(name="count")
IndUsers = IndUsers.groupby("user_id")["count"].count()
print("Average movie reviews per user:", IndUsers.mean())

# Counting the number of reviews for each movie
IndMovies = MovieLens.groupby(["title", "user_id"]).size()
IndMovies = IndMovies.reset_index(name="count")
IndMovies = IndMovies.groupby("title")["count"].count()
print("\nNumber of Reviews Per Movie\n")
print(IndMovies)

在修改后的代码中,我们使用groupby方法按所需的列对数据进行分组,然后应用size方法计算每组内的计数。我们重置索引,如果需要的话重命名count列,并设置所需的索引以供进一步分析。
请注意,此代码假定您在应用这些转换之前已将MovieLens数据加载到DataFrame 'MovieLens'中。如果您有不同的DataFrame名称或数据结构,请确保相应地调整代码。

相关问题