无法使用matplotlib绘制堆叠parlot

nfeuvbwi  于 2023-03-03  发布在  其他
关注(0)|答案(1)|浏览(131)

我有一个数据框架,其中第1列代表细菌名称,其余列是样本。我想绘制一个堆叠条形图,说明每个样本的细菌群落组成。
下面是我的数据框架(忽略极小的百分比,我按字母顺序对细菌名称进行了排序,是的,每列的总和为100%):
enter image description here
我试过:

# Create a list of sample file names
samples = grouped_sorted_df.columns[1:]

# Create a stacked bar plot
fig, ax = plt.subplots(figsize=(10, 6))
ax.bar(grouped_sorted_df['#Classification'], grouped_sorted_df[samples[0]], label=samples[0])
for i in range(1, len(samples)):
    ax.bar(grouped_sorted_df['#Classification'], grouped_sorted_df[samples[i]], bottom=grouped_sorted_df[samples[:i]].sum(axis=1), label=samples[i])

# Set the x-axis and y-axis labels
ax.set_xlabel('#Classification')
ax.set_ylabel('Abundance')
ax.set_title('Abundance of Bacteria genuses in Sample Files')
ax.legend()

# Show the plot
plt.show()

但这会产生一个极其可怕的图表,甚至不接近堆叠条形图。
enter image description here数据集示例(逗号用作分隔符)

#Classification,S25.tabular,S26.tabular,S27.tabular,S37.tabular
A2,0.0,0.0,0.0,0.00042261140036513626
AKYG587,0.0,0.0,0.0,0.00042261140036513626
ASF356,0.0,0.0,0.0,0.00042261140036513626
Acetitomaculum,0.003170610553905664,0.0007126364698839827,0.002212144674261697,0.0046487254040164985
Acidibacter,0.004227480738540885,0.0007126364698839827,0.0011060723371308485,0.0025356684021908176
Acidipila,0.0010568701846352213,0.0,0.0,0.0
Actinomyces,0.5717667698876547,0.12756192810923292,0.3815949563101427,20.320846575157212
Actinomycetospora,0.0021137403692704426,0.0,0.0,0.00042261140036513626
Actinoplanes,0.0010568701846352213,0.0,0.0,0.0
Actinotignum,0.0,0.0,0.0011060723371308485,0.0
Aeromicrobium,0.0021137403692704426,0.0,0.0,0.0
Aggregatibacter,0.0,0.0,0.0,0.0012678342010954088
Ahniella,0.0,0.0,0.002212144674261697,0.0
Akkermansia,0.0010568701846352213,0.0014252729397679655,0.0,0.00042261140036513626
Alcanivorax,0.0,0.0,0.0,0.00042261140036513626
Alloprevotella,0.32445914668301296,0.005701091759071862,0.8649485676363234,3.8626681993373455
Altererythrobacter,0.006341221107811328,0.0,0.0011060723371308485,0.00042261140036513626
Amycolatopsis,0.0010568701846352213,0.0,0.0011060723371308485,0.00042261140036513626
Anaerococcus,0.0010568701846352213,0.0014252729397679655,0.0,0.0
Anaerofustis,0.0,0.0007126364698839827,0.0,0.0
Anaeroglobus,0.013739312400257877,0.0,0.0,0.43528974237609036
Anaeroplasma,0.04121793720077363,0.0206664576266355,0.025439663754009516,0.027469741023733858
Anaerotruncus,0.0,0.0,0.0,0.00042261140036513626
Anaerovibrio,0.00845496147708177,0.0049884552891878795,0.005530361685654242,0.0038035026032862264
Anaerovorax,0.006341221107811328,0.007126364698839828,0.003318217011392545,0.00887483940766786
Aquicella,0.004227480738540885,0.0,0.0,0.00042261140036513626
Arenimonas,0.0,0.0,0.002212144674261697,0.0
Atopobium,2.7827391961445374,0.6292580029075568,1.9168233602477602,0.7053384272094124
Bdellovibrio,0.00951183166171699,0.0,0.005530361685654242,0.0038035026032862264
uqjltbpv

uqjltbpv1#

你可以设置大小与matplotlib setp width一旦Pandasplotiing这样做

for i in ax.containers:
    plt.setp(i, width=1) #Change width here 
plt.tight_layout()

要修复x轴刻度,只需调整刻度频率

ax.set_xticks(ax.get_xticks()[::4]) #Plug frequncy numbers here until graphs looks good

在你的代码中

# Create a list of sample file names
samples = grouped_sorted_df.columns[1:]

# Create a stacked bar plot
fig, ax = plt.subplots(figsize=(10, 6))
ax.bar(grouped_sorted_df['#Classification'], grouped_sorted_df[samples[0]], label=samples[0])
for i in range(1, len(samples)):
    ax.bar(grouped_sorted_df['#Classification'], grouped_sorted_df[samples[i]], bottom=grouped_sorted_df[samples[:i]].sum(axis=1), label=samples[i])

# Set the x-axis and y-axis labels
ax.set_xlabel('#Classification')
ax.set_ylabel('Abundance')
ax.set_title('Abundance of Bacteria genuses in Sample Files')
ax.legend()
ax.set_xticks(ax.get_xticks()[::4]) #Plug numbers here

for i in ax.containers:
    plt.setp(i, width=1) #Change width here 

# Show the plot
plt.tight_layout()
plt.show()

相关问题