Plotly:如何用观测值和回归线之间的线显示回归误差? 结果:完整代码:

问题描述

我在 python 中生成了以下绘图图表。我已将回归调整为一组有限的点,并得到以下图表:

enter image description here

我不想在这些点和调整后的曲线之间画一条垂直线,如下例所示:

enter image description here

我使用一个 plotly plotly.graph_objects,pandas生成这些图形,但我不知道如何绘制它们。这是我使用的代码

import pandas as pd
import plotly.graph_objects as go

for point,curve in zip(points,curves):

    point_plot = go.Scatter(x=df['Duration'],y=df[point],name=point,# text=df['Nome'],mode='markers+text',line_color=COLOR_CODE[point],textposition='top center')

    line_plot = go.Scatter(x=df['Duration'],y=df[curve],name='',mode='lines')
    

    # XXX: this don't solve the problem but it's what i Could think of for Now
    to_bar = df[points].diff(axis=1).copy()
    to_bar['Nome'] = df['Nome']
    bar_plot = go.Bar(x=to_bar['Nome'],y=to_bar[point],marker_color=COLOR_CODE[point])

                            
    fig.add_trace(line_plot,row=1,col=1)
    fig.add_trace(point_plot,col=1)
    fig.add_trace(bar_plot,row=2,col=1)

如果您认为需要,我可以共享我用来生成此图的数据框。欢迎提供任何帮助。

解决方法

您尚未提供包含数据示例的有效代码片段,因此我将根据我之前的回答 Plotly: How to plot a regression line using plotly? 提出建议。如果你的数字像你的例子一样被限制为两个系列,你可以:

1. 使用 xVals = fig.data[0]['x'] 从系列之一中检索 x 值,并且

2. 使用字典 errors = {} 组织回归线和观察标记的所有点,并且

3. 使用以下内容填充该字典:

for d in fig.data:
    errors[d['mode']]=d['y']

4. 然后,您可以使用以下方法为线条和标记(您的错误)之间的距离添加线条形状:

for i,x in enumerate(xVals):
    shapes.append(go.layout.Shape(type="line",[...])

结果:

enter image description here

完整代码:

import plotly.graph_objects as go
import statsmodels.api as sm
import pandas as pd
import numpy as np
import datetime

# data
np.random.seed(123)
numdays=20

X = (np.random.randint(low=-20,high=20,size=numdays).cumsum()+100).tolist()
Y = (np.random.randint(low=-20,size=numdays).cumsum()+100).tolist()

df = pd.DataFrame({'X': X,'Y':Y})

# regression
df['bestfit'] = sm.OLS(df['Y'],sm.add_constant(df['X'])).fit().fittedvalues

# plotly figure setup
fig=go.Figure()
fig.add_trace(go.Scatter(name='X vs Y',x=df['X'],y=df['Y'].values,mode='markers'))
fig.add_trace(go.Scatter(name='line of best fit',x=X,y=df['bestfit'],mode='lines'))


# plotly figure layout
fig.update_layout(xaxis_title = 'X',yaxis_title = 'Y')

# retrieve x-values from one of the series
xVals = fig.data[0]['x']

errors = {} # container for prediction errors

# organize data for errors in a dict
for d in fig.data:
    errors[d['mode']]=d['y']

shapes = [] # container for shapes

# make a line shape for each error == distance between each marker and line points
for i,x0=x,y0=errors['markers'][i],x1=x,y1=errors['lines'][i],line=dict(
                                        #color=np.random.choice(colors,1)[0],color = 'black',width=1),opacity=0.5,layer="above")
                 )

# include shapes in layout
fig.update_layout(shapes=shapes)
fig.show()