pandas Python Seaborn热图,两个轴上都有自定义顺序,数值来自频率表(包含数据)

wmomyfyw  于 2022-12-28  发布在  Python
关注(0)|答案(2)|浏览(156)

我在频率表中有这些数据,我只想创建一个在Y轴上有Fac1的热图,Fac2在X轴上,频率值作为热图。Fac1和Fac2中因子的顺序必须保持相同(在删除Fac1和Fac2列中的重复项后)。经过这么多次尝试,我还没有能够让这个工作,但我'我已经设法把数据整理好并用最简单的方式表示出来了。我将非常感谢这方面的帮助。

import pandas as pd
    import numpy as np
    from matplotlib import pyplot as plt
    import seaborn as sns
    
    url = "https://raw.githubusercontent.com/rroyss/stack/main/dfso.csv"
    df = pd.read_csv(url)

    plt.subplots(figsize=(15,30))
    plt.tick_params(axis='both', which='major', labelsize=10, labelbottom = False, bottom=False, top = True, labeltop=True)

    sns.heatmap(df, cmap="Blues", linewidth=1, xticklabels=True, yticklabels=True)
sh7euo9m

sh7euo9m1#

如果要使用heatmap,必须转换 Dataframe :

df2 = df.drop_duplicates().pivot_table(index='Fac1', columns='Fac2', values='Frequency Fac1-Fac2 pair', sort=False)

plt.subplots(figsize=(15, 30))
plt.tick_params(axis='both', which='major', labelsize=10, labelbottom=False, bottom=False, top=True, labeltop=True)

sns.heatmap(df2, cmap="Blues", linewidth=1, xticklabels=True, yticklabels=True)

结果如下(放大前几行和前几列):

nue99wik

nue99wik2#

首先需要重新组织 Dataframe ,使Fac 1成为索引,Fac 2成为列,并从第三列聚合值。例如df_pivoted = df.pivot_table(index ='Fac 1',columns ='Fac 2',values ='Frequency Fac 1-Fac 2 pair')。
热图将使用由pivot_table创建的列和索引提供的顺序。保持原始顺序有点棘手,但可以通过pd.Categorical(强制排序)与pd.unique()(保持原始顺序,不像np.unique)的组合来实现。

from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

url = "https://raw.githubusercontent.com/rroyss/stack/main/dfso.csv"
df = pd.read_csv(url)
df['Fac1'] = pd.Categorical(df['Fac1'], categories=pd.unique(df['Fac1']))
df['Fac2'] = pd.Categorical(df['Fac2'], categories=pd.unique(df['Fac2']))
df_pivoted = df.pivot_table(index='Fac1', columns='Fac2', values='Frequency Fac1-Fac2 pair')

fig, ax = plt.subplots(figsize=(20, 30))
sns.heatmap(data=df_pivoted, cmap='Blues', xticklabels=True, yticklabels=True, ax=ax)

ax.tick_params(axis='both', which='major', labelsize=10, labeltop=True, top=True, labelbottom=False, bottom=False)
ax.tick_params(axis='x', labelrotation=90)

plt.tight_layout()
plt.show()

如果您的目标是2D直方图或kde图,其中最后一列用作权重:

from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

url = "https://raw.githubusercontent.com/rroyss/stack/main/dfso.csv"
df = pd.read_csv(url)

df['Fac1'] = [int(f[5:]) for f in df['Fac1']]
df['Fac2'] = [int(f[6:]) for f in df['Fac2']]

fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(20, 10))

sns.histplot(data=df, x='Fac1', y='Fac2', weights='Frequency Fac1-Fac2 pair', bins=20, color='blue', ax=ax1)
sns.kdeplot(data=df, x='Fac1', y='Fac2', weights='Frequency Fac1-Fac2 pair', color='blue', ax=ax2)

for ax in (ax1, ax2):
    ax.tick_params(axis='both', which='major', labelsize=10)

plt.tight_layout()
plt.show()

相关问题