问题描述
我遇到了以前使用不同数据的Python中特定回归(GLM逆高斯)的问题。这是第一个数据集抛出错误以及对不同数据进行相同回归的示例。我无法隔离如何解决该问题。有趣的是,过滤不同等级的成绩将使回归运行没有问题。例如,仅使用4-11年级即可,而4-12年级则不行。参见所附的屏幕截图,绿色突出显示的范围将在回归中起作用,橙色不起作用。就其价值而言,该问题也仍然存在于R中。我还尝试过手动更改一些值,以使分布不那么双峰,这无济于事。
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
###First dataset,errors thrown.
data = {'Grade':['4','5','6','7','8','9','10','11','12','13','14','15'],'Salary': ['13068','9353','12682','13526','15280','18779.5','23520','26010','48231.5','48510','64400','89250']}
df = pd.DataFrame(data,columns=['Grade','Salary'])
df['Grade'] = df['Grade'].astype(int)
df['Salary'] = df['Salary'].astype(float)
df.dtypes
df.plot.scatter(x = 'Grade',y = 'Salary')
x = df['Grade']
y = df['Salary']
exog = sm.add_constant(x)
endog = y
model = sm.GLM(endog,exog,family = sm.families.InverseGaussian())
#errors thrown here
results = model.fit()
####### second data set,this data works on the GaussianInverse
data = {'Grade':['4','Salary': ['29279','33696.5','36634','42507','43362.5','45999.5','53522','61559.9','69635','78734.5','91868.5','106071']}
df2 = pd.DataFrame(data,'Salary'])
df2['Grade'] = df2['Grade'].astype(int)
df2['Salary'] = df2['Salary'].astype(float)
df2.dtypes
df2.plot.scatter(x = 'Grade',y = 'Salary')
x = df2['Grade']
y = df2['Salary']
exog = sm.add_constant(x)
endog = y
model = sm.GLM(endog,family = sm.families.InverseGaussian())
#no errors on df2
results = model.fit()
y_pred = pd.DataFrame(results.predict(exog))
print(results.summary())
这是错误消息:
C:\Users\%user%\Anaconda3\lib\site-packages\statsmodels\genmod\families\links.py:319: RuntimeWarning: invalid value encountered in power
return np.power(z,1. / self.power)
Traceback (most recent call last):
File "<ipython-input-842-89f6e4763922>",line 6,in <module>
results = model.fit()
File "C:\Users\%user%\Anaconda3\lib\site-packages\statsmodels\genmod\generalized_linear_model.py",line 1027,in fit
cov_kwds=cov_kwds,use_t=use_t,**kwargs)
File "C:\Users\%user%\Anaconda3\lib\site-packages\statsmodels\genmod\generalized_linear_model.py",line 1165,in _fit_irls
check_weights=True)
File "C:\Users\%user%Anaconda3\lib\site-packages\statsmodels\regression\_tools.py",line 48,in __init__
raise ValueError(self.msg.format('weights'))
**ValueError: NaN,inf or invalid value detected in weights,estimation infeasible.**
相同数据上的R错误是这样的:
Error: no valid set of coefficients has been found: please supply starting values
In addition: Warning message:
In sqrt(eta) : NaNs produced
也许我需要在这里尝试另一种方法吗?
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)