pandas 如何注解散点绘制与王子图书馆

b5buobof  于 2023-06-20  发布在  其他
关注(0)|答案(1)|浏览(77)

我正在使用库prince来执行对应分析

from prince import CA

我的列联表dummy_contingency看起来像这样:

{'v1': {'0': 4.479591836734694,
  '1': 75.08163265306122,
  '2': 1.1020408163265305,
  '3': 5.285714285714286,
  '4': 14.244897959183673,
  '5': 0.0,
  '6': 94.06122448979592,
  '7': 0.5102040816326531,
  '8': 87.62244897959184,
  '9': 16.102040816326532},
 'v2': {'0': 6.142857142857143,
  '1': 24.653061224489797,
  '2': 0.3979591836734694,
  '3': 2.63265306122449,
  '4': 18.714285714285715,
  '5': 0.0,
  '6': 60.92857142857143,
  '7': 1.030612244897959,
  '8': 71.73469387755102,
  '9': 14.76530612244898},
 'v3': {'0': 3.642857142857143,
  '1': 21.551020408163264,
  '2': 0.8061224489795918,
  '3': 2.979591836734694,
  '4': 14.5,
  '5': 0.030612244897959183,
  '6': 39.60204081632653,
  '7': 0.7551020408163265,
  '8': 71.89795918367346,
  '9': 11.571428571428571},
 'v4': {'0': 6.1020408163265305,
  '1': 25.632653061224488,
  '2': 0.6938775510204082,
  '3': 3.9285714285714284,
  '4': 21.581632653061224,
  '5': 0.22448979591836735,
  '6': 10.704081632653061,
  '7': 0.8469387755102041,
  '8': 71.21428571428571,
  '9': 12.489795918367347}}

卡方检验显示依赖性:

Chi-square statistic: 69.6630377155341
p-value: 1.2528156966101567e-05

现在我拟合数据:

dummy_contingency = pd.DataFrame(dummy_contingency)

ca_dummy = CA(n_components=2)  # Number of components for correspondence analysis
ca_dummy.fit(dummy_contingency)

情节:

fig = ca_dummy.plot(
    X=dummy_contingency)
fig

我如何为这个图做标签?其他人发布的示例(Using mca package in Python)使用了函数plot_coordinates(),该函数也可以选择放置标签。但看起来这个函数不再适用于prince包,需要使用plot()函数,它没有放置标签的选项。感谢你的帮助。
编辑:带有标签的输出示例:

文本为每个点的情节像"草莓","香蕉","酸奶"等。是我正在寻找的标签,其中蓝色点的索引值为0,1,2,3,4,5,6,7,8,9,橙色点的列名为“v1”、“v2”、“v3”、“v4”。

beq87vna

beq87vna1#

  • 将注解添加到散点图来自How to do annotations with Altair,但是,这不包括绘制ca中的点的必要步骤。
  • 为了注解correspondence-analysis图,必须从ca模型中提取.column_coordinates.row_coordinates。这些是图上的点,而不是来自df的点。
import pandas as pd
import prince
import altair as alt

# convert the dictionary of data to a dataframe
df = pd.DataFrame(dummy_contingency)

# create the model
ca = prince.CA()

# fit the model
ca = ca.fit(df)

# extract the column coordinate dataframe, and change the column names
cc = ca.column_coordinates(df).reset_index()
cc.columns = ['name', 'x', 'y']

# extract the row coordinates dataframe, and change the column names
rc = ca.row_coordinates(df).reset_index()
rc.columns = ['name', 'x', 'y']

# combine the dataframes
crc_df = pd.concat([cc, rc], ignore_index=True)

# plot and annotate
points = ca.plot(df)

annot = alt.Chart(crc_df).mark_text(
    align='left',
    baseline='middle',
    fontSize = 20,
    dx = 7
).encode(
    x='x',
    y='y',
    text='name'
)

points + annot
  • 请注意,图中已经有浮动注解,没有添加annot

  • 注解也可以在不将ccrc组合到单个 Dataframe 中的情况下添加。
points = ca.plot(df)

annot1 = alt.Chart(cc).mark_text(
    align='left',
    baseline='middle',
    fontSize = 20,
    dx = 7
).encode(
    x='x',
    y='y',
    text='name'
)

annot2 = alt.Chart(rc).mark_text(
    align='left',
    baseline='middle',
    fontSize = 20,
    dx = 7
).encode(
    x='x',
    y='y',
    text='name'
)

points + annot1 + annot2

相关问题