matplotlib Python DataFrame -为具有分组依据列(至少两列)的数据框绘制条形图

tvz2xvvm  于 2022-11-24  发布在  Python
关注(0)|答案(3)|浏览(129)

我一直在努力使用matlplotlib在python中重新创建这个Excel图表:

数据在 Dataframe 中;我正在尝试自动化生成这个图表的过程。
我试过拆分数据框,绘制子图,但我没能创建Excel中的"Zone"索引。我成功地绘制了没有"Zone"索引的图形,但这不是我真正想做的。
下面是我的代码:

data = pd.DataFrame(
    {
        'Factory Zone':
        ["AMERICAS","APAC","APAC","APAC","APAC","APAC","EMEA","EMEA","EMEA","EMEA"],
        'Factory Name':
        ["Chocolate Factory","Crayon Factory","Jobs Ur Us", "Gibberish US","Lil Grey", "Toys R Us","Food Inc.",
        "Pet Shop", "Bonbon Factory","Carrefour"],
        'Production Day 1':
        [24,1,9,29,92,79,4,90,42,35],
        'Production Day 2':
        [2,43,17,5,31,89,44,49,34,84]
    })
df = pd.DataFrame(data)
print(df)
# Without FactoryZone, it works:
df = df.drop(['Factory Zone'], axis=1)
image = df.plot(kind="bar")

数据如下所示:

Unnamed: 0 FactoryZone       Factory Name  Production Day 1  Production Day 2
0           1    AMERICAS  Chocolate Factory                24                43
1           2    AMERICAS     Crayon Factory                 1                17
2           3        EMEA           Pet Shop                 9                 5
3           4        EMEA     Bonbon Factory                29                31
4           5        APAC           Lil Grey                92                89
5           6    AMERICAS         Jobs Ur Us                79                44
6           7        APAC          Toys R Us                 4                49
7           8        EMEA          Carrefour                90                34
8           9    AMERICAS       Gibberish US                42                84
9          10        APAC          Food Inc.                35                62
2nc8po8w

2nc8po8w1#

您可以通过首先为分层数据集创建MultiIndex来创建此图,其中 level 0Factory Zonelevel 1Factory Name
第一个
就像Quang Hoang提出的那样,可以为每个区域创建一个子图,然后将它们粘在一起。每个子图的宽度必须根据工厂的数量进行修正,方法是使用gridspec_kw字典中的width_ratios参数,以便所有列都具有相同的宽度。然后可以进行无限的格式选择。
在下面的示例中,我选择仅在区域之间显示分隔线,为此使用了次要刻度线。此外,由于此处的图形宽度仅限于10英寸,因此我将较长的标签重写为两行。

# Create figure with a subplot for each factory zone with a relative width
# proportionate to the number of factories
zones = df.index.levels[0]
nplots = zones.size
plots_width_ratios = [df.xs(zone).index.size for zone in zones]
fig, axes = plt.subplots(nrows=1, ncols=nplots, sharey=True, figsize=(10, 4),
                         gridspec_kw = dict(width_ratios=plots_width_ratios, wspace=0))

# Loop through array of axes to create grouped bar chart for each factory zone
alpha = 0.3 # used for grid lines, bottom spine and separation lines between zones
for zone, ax in zip(zones, axes):
    # Create bar chart with grid lines and no spines except bottom one
    df.xs(zone).plot.bar(ax=ax, legend=None, zorder=2)
    ax.grid(axis='y', zorder=1, color='black', alpha=alpha)
    for spine in ['top', 'left', 'right']:
        ax.spines[spine].set_visible(False)
    ax.spines['bottom'].set_alpha(alpha)
    
    # Set and place x labels for factory zones
    ax.set_xlabel(zone)
    ax.xaxis.set_label_coords(x=0.5, y=-0.2)
    
    # Format major tick labels for factory names: note that because this figure is
    # only about 10 inches wide, I choose to rewrite the long names on two lines.
    ticklabels = [name.replace(' ', '\n') if len(name) > 10 else name
                  for name in df.xs(zone).index]
    ax.set_xticklabels(ticklabels, rotation=0, ha='center')
    ax.tick_params(axis='both', length=0, pad=7)
    
    # Set and format minor tick marks for separation lines between zones: note
    # that except for the first subplot, only the right tick mark is drawn to avoid
    # duplicate overlapping lines so that when an alpha different from 1 is chosen
    # (like in this example) all the lines look the same
    if ax.is_first_col():
        ax.set_xticks([*ax.get_xlim()], minor=True)
    else:
        ax.set_xticks([ax.get_xlim()[1]], minor=True)
    ax.tick_params(which='minor', length=55, width=0.8, color=[0, 0, 0, alpha])

# Add legend using the labels and handles from the last subplot
fig.legend(*ax.get_legend_handles_labels(), frameon=False, loc=(0.08, 0.77))

fig.suptitle('Production Quantity by Zone and Factory on both days', y=1.02, size=14);

参考文献:Quang Hoang的答案,this answer by gyx-hh

kxxlusnw

kxxlusnw2#

给出封闭图的一个想法是在一个子图中对彼此相邻放置的每个Factory Zone进行绘图:

# setting up the subplots
fig, axes = plt.subplots(1, len(df['Factory Zone'].unique()), 
                         figsize=(12,4),
                         sharex=True, sharey=True, 
                         gridspec_kw={'wspace':0},
                         subplot_kw={'frameon':False})

# use groupby to loop through the `Factory Zone`
for (k,d), ax in zip(df.groupby('Factory Zone'), axes):

    # plot the data into subplot
    d.plot.bar(x='Factory Name', ax=ax)
    
    # set label to the `Factory Zone`
    ax.set_xlabel(k)
    
    # remove the extra legend in each subplot
    legend = ax.legend()
    handlers = ax.get_legend_handles_labels()
    ax.legend().remove()
    ax.grid(True, axis='y')

# reinstall the last legend
ax.legend(*handlers)

输出:

41ik7eoe

41ik7eoe3#

帕特里克FitzGerald提供的解决方案中有一行在Matplotlib 3.4中被弃用,并将在2个小版本中被删除。(我想把这作为一个评论而不是答案,但我还没有足够的声誉!)
变更:

if ax.is_first_col():

if ax.get_subplotspec().is_first_col():

相关问题