将带有自定义上限和下限的误差线添加到 python 中的条形图

问题描述

我想将我计算的 HDI(高密度区间)(下面 df 中的 hdi_bothhdi_onelower_upper 列)添加到条形图中。

但是,我无法弄清楚如何添加误差线/CI,以便每个误差线都具有独立于 y 值的自定义上限和下限(在本例中为 proportion_correct 中的相应值)。

例如,Exp 的 HDI 间隔。具有 guesses_correct both 的 1 的下限为 0.000000 ,上限为 0.130435proportion_correct 0.000000

我看到的所有选项都包括指定相对于 y 轴上的值的上限和下限,这不是我想要的。

任何帮助将不胜感激。

谢谢,

阿亚拉

import os
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.DataFrame({
 'exp': ['Exp. 1','Exp. 1','Exp. 2','Exp. 3','Exp. 4','Exp. 5','Collapsed','Collapsed'],'proportion_correct': [0.0,0.304347826,0.058823529000000006,0.31372549,0.047619048,0.333333333,0.12244898,0.428571429,0.367346939,0.082901554,0.35751295299999997],'guesses_correct': ['both','one','both','one'],'hdi_both': [0.0,0.130434783,0.0,0.078431373,0.1,0.08,0.081632653,0.005181347,0.051813472],'hdi_one': [0.130434783,0.47826087,0.156862745,0.41176470600000004,0.5,0.16,0.4,0.163265306,0.408163265,0.21761658,0.341968912],'lower_upper': ['lower','upper','lower','upper']
})

print(df.head())
Out[4]: 
      exp  proportion_correct guesses_correct  hdi_both   hdi_one lower_upper
0  Exp. 1            0.000000            both  0.000000  0.130435       lower
1  Exp. 1            0.304348             one  0.130435  0.478261       upper
2  Exp. 2            0.058824            both  0.000000  0.156863       lower
3  Exp. 2            0.313725             one  0.078431  0.411765       upper
4  Exp. 3            0.047619            both  0.000000  0.100000       lower
# Make bar plot
sns.barplot(x='exp',y='proportion_correct',hue='guesses_correct',data=df)

plt.ylim([0,0.5])
plt.xlabel('Experiment')
plt.ylabel('Proportion Correct')
plt.legend(title='Correct guesses',loc='upper right')
plt.axhline(y=0.277777,color='dimgray',linestyle='--')
plt.annotate(' chance\n one',(5.5,0.27))
plt.axhline(y=0.02777,linestyle='--')
plt.annotate(' chance\n both',0.02))
# Show the plot
plt.show()

这是我要为其添加 HDI 的条形图

enter image description here

解决方法

尽管您已经以绝对值计算了误差条的下限和上限,但它们通常被认为是围绕特定 y 值的下限和上限误差。但是,通过从您计算的边界中减去 y 值,很容易计算出误差线的“相对”长度。

然后您可以使用 plt.errorbar() 进行绘图。请注意,要使用此函数,所有错误值必须为正。

由于您使用的是 hue= 拆分,因此您必须遍历 hue 的不同级别,并考虑条形的移动(默认情况下,两个级别的 -0.2 和 +0.2色调):

# Make bar plot
x_col = 'exp'
y_col = 'proportion_correct'
hue_col = 'guesses_correct'
low_col = 'hdi_both'
high_col = 'hdi_one'
sns.barplot(x=x_col,y=y_col,hue=hue_col,data=df)

for (h,g),pos in zip(df.groupby(hue_col),[-0.2,0.2]):
    err = g[[low_col,high_col]].subtract(g[y_col],axis=0).abs().T.values
    x = np.arange(len(g[x_col].unique()))+pos
    plt.errorbar(x=x,y=g[y_col],yerr=err,fmt='none',capsize=5,ecolor='k')

enter image description here

,

我最终将垂直线绘制为误差线。这是我的代码,以防它对某人有所帮助。

df = pd.DataFrame({'exp': ['Exp. 1','Exp. 1','Exp. 2','Exp. 3','Exp. 4','Exp. 5','Collapsed','Collapsed'],'proportion_correct': [0.0,0.304347826,0.058823529000000006,0.31372549,0.047619048,0.333333333,0.12244898,0.428571429,0.367346939,0.082901554,0.35751295299999997],'guesses_correct': ['both','one','both','one'],'hdi_low': [0.0,0.130434783,0.0,0.156862745,0.1,0.16,0.163265306,0.005181347,0.21761658],'hdi_high': [0.130434783,0.47826087,0.078431373,0.41176470600000004,0.5,0.08,0.4,0.081632653,0.408163265,0.051813472,0.341968912]
                  })
df.head()
Out[4]: 
  exp  proportion_correct guesses_correct   hdi_low  hdi_high
0  Exp. 1            0.000000            both  0.000000  0.130435
1  Exp. 1            0.304348             one  0.130435  0.478261
2  Exp. 2            0.058824            both  0.000000  0.078431
3  Exp. 2            0.313725             one  0.156863  0.411765
4  Exp. 3            0.047619            both  0.000000  0.100000

以下 axvlinesaxhlines 函数取自 How to draw vertical lines on a given plot in matplotlib。为清楚起见,我不会将它们写在这里。

    # Make bar plot
    x_col = 'exp'
    y_col = 'proportion_correct'
    hue_col = 'guesses_correct'
    low_col = 'hdi_low'
    high_col = 'hdi_high'
    plot = sns.barplot(x=x_col,data=df)
    plt.ylim([0,0.55])
    plt.yticks([0,0.2,0.3,0.5],[0,0.5])
    plt.xlabel('Experiment')
    plt.ylabel('Proportion Correct')
    plt.legend(title='Correct guesses',loc='upper right')
    plt.axhline(y=0.277777,color='dimgray',linestyle='--')
    plt.annotate(' chance\n one',(5.65,0.27))
    plt.axhline(y=0.02777,linestyle='--')
    plt.annotate(' chance\n both',0.02))
    lims_x = list(map(lambda x,y: (x,y),df[low_col].to_list(),df[high_col].to_list()))
    xss = [-0.2,0.8,1.2,1.8,2.2,2.8,3.2,3.8,4.2,4.8,5.2]
    yss = [i for sub in lims_x for i in sub]
    lims_y = [(-0.3,-0.1),(-0.3,(0.1,0.3),(0.7,0.9),(1.1,1.3),(1.7,1.9),(2.1,2.3),(2.7,2.9),(3.1,3.3),(3.7,3.9),(4.1,4.3),(4.7,4.9),(5.1,5.3),5.3)]
    for xs,lim in zip(xss,lims_x):
        plot = axvlines(xs,lims=lim,color='black')
    for yx,lim in zip(yss,lims_y):
        plot = axhlines(yx,color='black')
    plt.show()

这是剧情 enter image description here