猜测将 Voigt 曲线拟合到数据 - Python 脚本的行为不规律

问题描述

让我描述一下我正在尝试做的事情。这需要比我更了解 Python 的人的眼睛。

我有一组数据(实际上是样品中的沉积物直径与百分比),并且在绘制时显示了一个独特的光谱。我假设数据中隐藏了“模式”,并试图强制拟合 voigt、guassian 或 lorentzian 曲线,以便提取一些信息。这个脚本的框架来自一个对 XRD 数据做类似事情的人。我不够熟练,无法真正理解脚本是如何实现目标的,所以我无法区分一些奇怪的行为。我先把奇怪的地方概述一下,然后我来分享代码。

  1. 如果我使用相同的数据一遍又一遍地运行代码,结果并不总是相同。不仅如此,而且可能有 25% 的时间,我会收到一个我无法弄清楚的错误。为什么会发生此错误,为什么只在某些时候发生?

TypeError: 不支持的操作数类型 -: 'tuple' 和 'float'

  1. 当我在代码的开头定义“spec”时,我必须指定模型类型。一次偶然的机会,我首先尝试了 VoigtModel,而且它在大多数情况下都有效。但是,如果我将类型指定为 Gaussian 或 Lorentzian,则脚本根本不会运行:

TypeError: 不能将序列乘以类型为 'float' 的非整数

  1. 在脚本中,我要求它打印一些关于它适合的曲线的信息。具体来说,就是曲线峰值的 x、y 值。但是,当我随后运行它时,它可能适合不同的曲线,但 print() 输出不会改变。比如,什么?

如果有人可以尝试一下代码,或许能提供一些关于这段代码有什么问题的见解,我将不胜感激。

edit 我发现如果我添加更多 {'type': 'VoigtModel'} 到 spec = ,脚本失败的频率会降低。如果我删除一些(留下一两个),那么它以更大的百分比失败。仍然可以使用一些帮助来理解连接。

代码:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import math
import random

from lmfit import models

x = 0,0.09326263,0.186541806,0.279826296,0.373096863,0.466372043,0.559644359,0.652910952,0.746190193,0.839463682,0.932734784,1.026014714,1.119288717,1.212558343,1.305836463,1.399111865,1.492381488,1.585657384,1.678931325,1.772207061,1.865478378,1.958752334,2.05202538,2.145299504,2.238574433,2.331847735,2.425123471,2.518395825,2.611671451,2.704945386,2.798218396,2.891491964,2.984766114,3.078040106,3.171314505,3.264585057,3.357863555,3.451137678,3.544409886,3.637684839,3.730956661,3.824229504,3.917507936,4.010781777,4.104055591,4.197326,4.290603266,4.383874926,4.477149297,4.57042345,4.663698494,4.756972396,4.850245469,4.943519232,5.036793499,5.13006734,5.223340556,5.316615186,5.409888929,5.503163537,5.596438512,5.689708905,5.782986369,5.876257098,5.969532028,6.062807987,6.156078156,6.249352461,6.342627453,6.43590194,6.529177933,6.622450379,6.715725752,6.808997914,6.902272777,6.995546352,7.088819796,7.18209372,7.275367937,7.36864248,7.461916216,7.555189618,7.648464489,7.741737739,7.835015624,7.928288902,8.021559911,8.114833257,8.208110415,8.301378965,8.394658258,8.487929146,8.581205011,8.674478952,8.767749555,8.861024001,8.954299075,9.047574353,9.140848269,9.234120373,9.327394253,9.420668151,9.513942544,9.607217038,9.700491238,9.793764758,9.887039268,9.980313168,10.0735868,10.16686092,10.26013875,10.35340805,10.44668356,10.53995856,10.63323182,10.72650553
y = 0.001352,0.001721,0.002661,0.00523,0.010879,0.020142,0.030427,0.039188,0.046922,0.055438,0.065352,0.076432,0.089913,0.107888,0.132296,0.164797,0.208043,0.266067,0.343688,0.443698,0.565158,0.704086,0.854979,1.01437,1.17932,1.34739,1.51366,1.67215,1.81638,1.94147,2.0432,2.11934,2.16792,2.19005,2.18907,2.17172,2.14565,2.11866,2.09749,2.08736,2.09102,2.1084,2.13739,2.17478,2.21729,2.26139,2.30342,2.33966,2.36671,2.38045,2.37413,2.33769,2.26088,2.13908,1.9769,1.78619,1.57832,1.35944,1.13483,0.919488,0.743312,0.637312,0.615423,0.665356,0.744581,0.78791,0.743882,0.617121,0.46602,0.356204,0.320677,0.361725,0.45788,0.566712,0.650727,0.701846,0.739237,0.788714,0.863346,0.956347,1.04314,1.09353,1.0874,1.02493,0.925497,0.815472,0.721377,0.658056,0.628985,0.623906,0.617012,0.578717,0.487132,0.346259,0.185964,0.066494,0.011942,0.000815,0
#xlog = [math.log(xval) for xval in x]

spec = {
    'x': x,'y': y,'model': [
        {'type': 'VoigtModel'},{'type': 'VoigtModel'},]}

plt.plot(spec['x'],spec['y'])
plt.show()


def update_spec_from_peaks(spec,model_indicies,peak_widths=(1,50),**kwargs):
    x = spec['x']
    y = spec['y']
    x_range = np.max(x) - np.min(x)
    peak_indicies = signal.find_peaks_cwt(y,peak_widths)
    np.random.shuffle(peak_indicies)
    for peak_indicie,model_indicie in zip(peak_indicies.tolist(),model_indicies):
        model = spec['model'][model_indicie]
        if model['type'] in ['GaussianModel','LorentzianModel','VoigtModel']:
            params = {
                'height': y[peak_indicie],'sigma': x_range / len(x) * np.min(peak_widths),'center': x[peak_indicie]
            }
            if 'params' in model:
                model.update(params)
            else:
                model['params'] = params
    return peak_indicies

# 
peaks_found = update_spec_from_peaks(spec,[0],peak_widths=(5,))    
print(peaks_found)

for i in peaks_found:
 print(x[i],y[i])


def generate_model(spec):
    composite_model = None
    params = None
    x = spec['x']
    y = spec['y']
    x_min = np.min(x)
    x_max = np.max(x)
    x_range = x_max - x_min
    y_max = np.max(y)
    for i,basis_func in enumerate(spec['model']):
        prefix = f'm{i}_'
        model = getattr(models,basis_func['type'])(prefix=prefix)
        if basis_func['type'] in ['GaussianModel','VoigtModel']: # for now VoigtModel has gamma constrained to sigma
            model.set_param_hint('sigma',min=1e-6,max=x_range)
            model.set_param_hint('center',min=x_min,max=x_max)
            model.set_param_hint('height',max=1.1*y_max)
            model.set_param_hint('amplitude',min=1e-6)
            # default guess is horrible!! do not use guess()
            default_params = {
                prefix+'center': x_min + x_range * random.random(),prefix+'height': y_max * random.random(),prefix+'sigma': x_range * random.random()
            }
        else:
            raise NotImplemented(f'model {basis_func["type"]} not implemented yet')
        if 'help' in basis_func:  # allow override of settings in parameter
            for param,options in basis_func['help'].items():
                model.set_param_hint(param,**options)
        model_params = model.make_params(**default_params,**basis_func.get('params',{}))
        if params is None:
            params = model_params
        else:
            params.update(model_params)
        if composite_model is None:
            composite_model = model
        else:
            composite_model = composite_model + model
    return composite_model,params


model,params = generate_model(spec)
output = model.fit(spec['y'],params,x=spec['x'])
fig,ax = plt.subplots()
ax.scatter(spec['x'],spec['y'],s=4)
components = output.eval_components(x=spec['x'])
print(len(spec['model']))
for i,model in enumerate(spec['model']):
    ax.plot(spec['x'],components[f'm{i}_'])```

解决方法

很明显,当给定相同的输入时,任何代码都会每次运行完全相同。

拟合似乎表现不稳定,因为您故意提供不稳定的数据。真的,您告诉它随机化初始起始值。而且您正在以编程方式设置边界,而不是检查初始值与边界的接近程度。那么,问问自己,你为什么要做这些事情?

你的代码看起来很复杂,可能复杂到你看不懂。从摆脱所有垃圾开始。也许制作一个高斯总和的模型,也许是这样的(代码会运行,给出合适的拟合):

import numpy as np
import matplotlib.pyplot as plt
from scipy import signal

from lmfit import models

x = np.array([0,0.09326263,0.186541806,0.279826296,0.373096863,0.466372043,0.559644359,0.652910952,0.746190193,0.839463682,0.932734784,1.026014714,1.119288717,1.212558343,1.305836463,1.399111865,1.492381488,1.585657384,1.678931325,1.772207061,1.865478378,1.958752334,2.05202538,2.145299504,2.238574433,2.331847735,2.425123471,2.518395825,2.611671451,2.704945386,2.798218396,2.891491964,2.984766114,3.078040106,3.171314505,3.264585057,3.357863555,3.451137678,3.544409886,3.637684839,3.730956661,3.824229504,3.917507936,4.010781777,4.104055591,4.197326,4.290603266,4.383874926,4.477149297,4.57042345,4.663698494,4.756972396,4.850245469,4.943519232,5.036793499,5.13006734,5.223340556,5.316615186,5.409888929,5.503163537,5.596438512,5.689708905,5.782986369,5.876257098,5.969532028,6.062807987,6.156078156,6.249352461,6.342627453,6.43590194,6.529177933,6.622450379,6.715725752,6.808997914,6.902272777,6.995546352,7.088819796,7.18209372,7.275367937,7.36864248,7.461916216,7.555189618,7.648464489,7.741737739,7.835015624,7.928288902,8.021559911,8.114833257,8.208110415,8.301378965,8.394658258,8.487929146,8.581205011,8.674478952,8.767749555,8.861024001,8.954299075,9.047574353,9.140848269,9.234120373,9.327394253,9.420668151,9.513942544,9.607217038,9.700491238,9.793764758,9.887039268,9.980313168,10.0735868,10.16686092,10.26013875,10.35340805,10.44668356,10.53995856,10.63323182,10.72650553])

y = np.array([0.001352,0.001721,0.002661,0.00523,0.010879,0.020142,0.030427,0.039188,0.046922,0.055438,0.065352,0.076432,0.089913,0.107888,0.132296,0.164797,0.208043,0.266067,0.343688,0.443698,0.565158,0.704086,0.854979,1.01437,1.17932,1.34739,1.51366,1.67215,1.81638,1.94147,2.0432,2.11934,2.16792,2.19005,2.18907,2.17172,2.14565,2.11866,2.09749,2.08736,2.09102,2.1084,2.13739,2.17478,2.21729,2.26139,2.30342,2.33966,2.36671,2.38045,2.37413,2.33769,2.26088,2.13908,1.9769,1.78619,1.57832,1.35944,1.13483,0.919488,0.743312,0.637312,0.615423,0.665356,0.744581,0.78791,0.743882,0.617121,0.46602,0.356204,0.320677,0.361725,0.45788,0.566712,0.650727,0.701846,0.739237,0.788714,0.863346,0.956347,1.04314,1.09353,1.0874,1.02493,0.925497,0.815472,0.721377,0.658056,0.628985,0.623906,0.617012,0.578717,0.487132,0.346259,0.185964,0.066494,0.011942,0.000815,0])


peaks = signal.find_peaks_cwt(y,(1.5,25))             
xstep = x.ptp() / len(x)

model,params = None,None

for i,peak_index in enumerate(peaks):
    this_model = models.GaussianModel(prefix=f'p{1+i:d}_')
    this_params = this_model.make_params(amplitude=y[peak_index],center=x[peak_index],sigma=2*xstep)
    if model is None:
        model = this_model
        params = this_params
    else:
        model += this_model
        params.update(this_params)
        
result = model.fit(y,params,x=x)
print(result.fit_report())

plt.plot(x,y,label='data')
plt.plot(x,result.best_fit,label='fit')
plt.legend()
plt.show()

需要比这复杂得多吗?嗯,也许不是。这提供了一个不错的合身,尽管它可能会在 x=7 附近丢失一个微妙的肩峰。

从简单开始。尽可能长时间保持简单。只有在简化其他事物时才增加复杂性。

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...