python 为什么我的numpy平均值结果与excel计算的平均值不同？

de90aj5v 于 2023-10-15 发布在 Python

关注(0)|答案(1)|浏览(252)

所以我的数据集，也就是“filtered_test.xlsx”，包含了一个关于美国一些酒店的信息数据集，包括评级等内容。所以我想计算平均评分的平均值分组的酒店名称。我已经在“filtered_test.xlsx”文件中删除了值为0和'NaN'的变量。代码如下：

import numpy as np
   import pandas as pd
   import matplotlib.pyplot as plt
   import seaborn as sns 

   from matplotlib import rcParams #for chart

   df = pd.read_excel("filtered_test.xlsx")

   #Creates the average hotel ratings by hotel name
   hotel_rating = df.groupby('name') ['reviews.rating'].mean().reset_index()
   hotel_rating.to_excel("hotel_ratings.xlsx", index = False)
   hotelRating_df = pd.read_excel("hotel_ratings.xlsx")

基本上，当我使用excel计算时，我从numpy获得的平均值与其正确值不一致。
举个例子：
example from filtered excel这张图片显示一个酒店有一个值是4。那么，我的hotel_ratings.xlsx中的平均酒店评级不应该是这家“美国长住酒店”的4分吗？但是在hotel_ratings.xlsx中，这家酒店的平均值是numpy mean of said hotel，和它应该的值不一样。
我该怎么解决这个问题……

python

来源：https://stackoverflow.com/questions/77259984/why-is-my-numpy-mean-outcome-different-from-calculating-the-mean-by-excel

1条答案

按热度按时间

r1zhe5dt1#

Excel中应用的过滤器不会转移到Python。也许你要找的是更接近于：

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns 

from matplotlib import rcParams #for chart

df = pd.read_excel("filtered_test.xlsx")
df.fillna(-1,inplace=True)#Replace null values with -1
valid_ratings = df[df['reviews.rating']>-1] #Filter out rows that have their rating as -1
#if you want to filter out zeros, just change fillna and valid_ratings to use 0

#Creates the average hotel ratings by hotel name
hotel_rating = valid_ratings.groupby('name') ['reviews.rating'].mean().reset_index()
hotel_rating.to_excel("hotel_ratings.xlsx", index = False)
hotelRating_df = pd.read_excel("hotel_ratings.xlsx")

赞(0）回复(0）举报 2023-10-15

我来回答

python 为什么我的numpy平均值结果与excel计算的平均值不同？

1条答案

相关问题

热门标签

最新问答