有关使用ODR进行曲线拟合的问题

问题描述

我正在尝试寻找描述观测数据集的8个特征之间的相关性。我选择使用SciPy的正交距离回归实现，因为它可以处理两个变量中的错误。问题在于，并非所有这些特征都有与它们的值相关的错误，并且除了极少数的组合以外，没有理论基础来预测这些特征如何或什至是如何相关。发现后一种信息实际上是我研究的重点。我遇到了问题，我需要知道问题是否出在我的实现上，或者是否存在更深层次的问题。

首先，我将从实现中发布一些代码，以便我可以引用它。这不是最小的工作示例；这也许是我创建的最复杂的代码，我不知道如何将其简化为合理的内容，但是它应该使您了解如何实现ODR。

import numpy as np
import matplotlib.pyplot as plt
from scipy.odr import ODR,Model,Data,RealData
%matplotlib notebook

# Define a linear function to fit.
def linear_func(C,x):
    return C[0] * x + C[1]

# Now we need to determine the actual correlations.  Loop over the combinations and find the functional form of the
# correlation.
for keys in key_list:

    ### SORT DATA ###

    # Create an array to hold the data for this combination of characteristics.
    data_array = np.ones((n_rows,4))

    # Determine the first columns for both characteristics.
    a = first_column_dict[keys[0]]
    b = first_column_dict[keys[1]]

    # Put the raw data for the two characteristics into their proper place.
    data_array[:,0] = sample_data[:,a]
    data_array[:,2] = sample_data[:,b]

    # Now fill in the weights (if such exist).  We will need to find an "average" error if there are unequal errors.
    # Start with the first characteristic.
    if (keys[0] in one_error_list) == True:
        data_array[:,1] = sample_data[:,a + 1]
    elif (keys[0] in two_error_list) == True:
        data_array[:,1] = (sample_data[:,a + 1] + sample_data[:,a + 2]) / 2

    # Now do the same with the second characteristic.
    if (keys[1] in one_error_list) == True:
        data_array[:,3] = sample_data[:,b + 1]
    elif (keys[1] in two_error_list) == True:
        data_array[:,3] = (sample_data[:,b + 1] + sample_data[:,b + 2]) / 2

    # Define a mask to remove rows with values of NaN.
    mask = ~np.isnan(data_array[:,0]) & ~np.isnan(data_array[:,2])

    ### DETERmine CORRELATIONS ###

    # Define the data being fit.
    data = RealData(data_array[mask][:,0],data_array[mask][:,2],sx=data_array[mask][:,1],sy=data_array[mask][:,3])

    # Define the model being fit.
    model = Model(linear_func)

    # Define the ODR that will be used to fit the data.
    odr_obj = ODR(data,model,beta0=[1,0])
    odr_obj.set_job(fit_type=0,deriv=0)

    # Run the model.
    fit = odr_obj.run()

我只是想找到适合我确定的特征对的拟合方程（使用Spearman秩相关系数）。有时，ODR所找到的适合度有时显然是荒谬的，这取决于我如何设置代码以运行ODR。一个示例是一对特性，其中x轴变量具有与之相关的错误，而y轴变量没有。我输入已知的x变量错误，并将y变量的值设置为1；根据文档，这应该将y变量的权重设置为1，并且鉴于x变量的错误，x变量的权重很小。我怀疑这可能是问题的原因，但我不知道如何解决。这是具有我刚刚描述的拟合度的数据图：

这显然没有道理；适合完全不正确。但是，如果将两个轴上的所有权重都设置为1，则会得到以下相关图：

这是一个更好的选择，但它忽略了错误，这是我首先选择ODR的原因之一！如果我在set_job命令中将fit_type设置为2，迫使程序进行普通的最小二乘拟合，则会得到与第二种相似的外观：

这看起来再合适不过了，但我不知道它是否比我的第二个结果更好或更准确。我更信任哪一个？更糟糕的是，在消除错误时，并非所有特性都遵循这种精度提高的趋势。只有当我像第一个示例一样完全包含错误时，某些拟合才有意义。并不是所有的特征都是线性相关的，有些显然是对数相关的。

对此我有两个问题。我应该完全使用ODR，还是在尝试确定这些相关性时使用更合适的例程？我定义错误的实现是否正确，特别是在其中一个特征没有相关错误的情况下？

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

curve-fitting python scipy scipy