pandas 创建一个matplotlib或海运直方图使用百分比而不是计数？

h7appiyu 于 2023-03-16 发布在其他

关注(0)|答案(5)|浏览(134)

具体来说，我正在处理Kaggle泰坦尼克号的数据集。我绘制了一个堆叠直方图，显示了泰坦尼克号上幸存和死亡的年龄。代码如下。

figure = plt.figure(figsize=(15,8))
plt.hist([data[data['Survived']==1]['Age'], data[data['Survived']==0]['Age']], stacked=True, bins=30, label=['Survived','Dead'])
plt.xlabel('Age')
plt.ylabel('Number of passengers')
plt.legend()

我想修改一下这个图表，每个箱子显示一个存活者的百分比，例如，如果一个箱子包含10-20岁之间的年龄，并且泰坦尼克号上60%的人在这个年龄组中存活，那么高度将沿着y轴排列60%。
编辑：我可能没有很好地解释我所寻找的东西，我没有改变y轴的值，而是根据存活的百分比来改变条形的实际形状。
图中的第一个条柱显示该年龄组大约65%的存活率。我希望此条柱与y轴对齐，位于65%处。接下来的条柱分别为90%、50%、10%，依此类推。

这个图最终会是这样的：

pandas

来源：https://stackoverflow.com/questions/40092294/creating-a-matplotlib-or-seaborn-histogram-which-uses-percent-rather-than-count

5条答案

按热度按时间

sauutmhj1#

对于海运，请使用参数stat。根据documentation，当前支持的stat参数值为：

count显示观测数
frequency显示观测数除以条柱宽度
density对计数进行归一化，以使直方图的面积为1
probability对计数进行归一化，以使条形高度之和为1
percent归一化，使得条高度总和为100

stat为count时的结果：

seaborn.histplot(
    data=data,
    x='variable',
    discrete=True,
    stat='count'
)

stat变更为probability后的结果：

seaborn.histplot(
    data=data,
    x='variable',
    discrete=True,
    stat='probability'
)

赞(0）回复(0）举报 2023-03-16

vh0rcniy2#

也许以下内容会有所帮助...
1.根据“存活”拆分 Dataframe

df_survived=df[df['Survived']==1]
df_not_survive=df[df['Survived']==0]

1.创建箱子

age_bins=np.linspace(0,80,21)

1.使用np.直方图生成直方图数据

survived_hist=np.histogram(df_survived['Age'],bins=age_bins,range=(0,80))
not_survive_hist=np.histogram(df_not_survive['Age'],bins=age_bins,range=(0,80))

1.计算每个分组中的存活率

surv_rates=survived_hist[0]/(survived_hist[0]+not_survive_hist[0])

1.情节

plt.bar(age_bins[:-1],surv_rates,width=age_bins[1]-age_bins[0])
plt.xlabel('Age')
plt.ylabel('Survival Rate')

赞(0）回复(0）举报 2023-03-16

z9ju0rcb3#

pd.Series.hist在下面使用np.histogram。
我们来探讨一下

np.random.seed([3,1415])
s = pd.Series(np.random.randn(100))
d = np.histogram(s, normed=True)
print('\nthese are the normalized counts\n')
print(d[0])
print('\nthese are the bin values, or average of the bin edges\n')
print(d[1])

these are the normalized counts

[ 0.11552497  0.18483996  0.06931498  0.32346993  0.39278491  0.36967992
  0.32346993  0.25415494  0.25415494  0.02310499]

these are the bin edges

[-2.25905503 -1.82624818 -1.39344133 -0.96063448 -0.52782764 -0.09502079
  0.33778606  0.77059291  1.20339976  1.6362066   2.06901345]

我们可以在计算平均条柱边缘时绘制这些图

pd.Series(d[0], pd.Series(d[1]).rolling(2).mean().dropna().round(2).values).plot.bar()

实际答案

或
我们可以简单地将normed=True传递给pd.Series.hist方法，pd.Series.hist方法将它传递给np.histogram

s.hist(normed=True)

赞(0）回复(0）举报 2023-03-16

yptwkmov4#

library Dexplot能够返回组的相对频率。目前，您需要使用cut函数将age变量合并到Pandas中。然后，您可以使用Dexplot。

titanic['age2'] = pd.cut(titanic['age'], range(0, 110, 10))

将要计数的变量（age2）传递给count函数。使用split参数细分计数，然后使用age2进行归一化。此外，这可能是绘制堆叠条形图的好时机

dxp.count('age2', data=titanic, split='survived', stacked=True, normalize='age2')

赞(0）回复(0）举报 2023-03-16

zy1mlcev5#

首先，最好创建一个函数，将数据按年龄组进行分割

# This function splits our data frame in predifined age groups
def cutDF(df):
    return pd.cut(
        df,[0, 10, 20, 30, 40, 50, 60, 70, 80], 
        labels=['0-10', '11-20', '21-30', '31-40', '41-50', '51-60', '61-70', '71-80'])

data['AgeGroup'] = data[['Age']].apply(cutDF)

然后可以按如下方式绘制图形：

survival_per_age_group = data.groupby('AgeGroup')['Survived'].mean()

# Creating the plot that will show survival % per age group and gender
ax = survival_per_age_group.plot(kind='bar', color='green')
ax.set_title("Survivors by Age Group", fontsize=14, fontweight='bold')
ax.set_xlabel("Age Groups")
ax.set_ylabel("Percentage")
ax.tick_params(axis='x', top='off')
ax.tick_params(axis='y', right='off')
plt.xticks(rotation='horizontal')             

# Importing the relevant fuction to format the y axis 
from matplotlib.ticker import FuncFormatter

ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: '{:.0%}'.format(y)))
plt.show()

赞(0）回复(0）举报 2023-03-16

我来回答

pandas 创建一个matplotlib或海运直方图使用百分比而不是计数？

5条答案

相关问题

热门标签

最新问答