问题描述
我尝试对糖尿病进行Logistic回归并获得模型的结果,我假设每个变量都有1个系数数,但结果给了我3个不同的coef数列表以及3个不同的截距。 我在线性回归中尝试过,每个线性回归都给出1
import pandas as pd
import sklearn
from sklearn.linear_model import LogisticRegression
import numpy as np
from sklearn import linear_model,preprocessing
data = pd.read_csv ('diabetestype.csv',sep = ',')
le = preprocessing.LabelEncoder()
Age = list(data['Age']) #will take all buying to a list and transform into proper integer values
BSf = list(data['BS Fast'])
BSp = list(data['BS pp'])
PR = list(data['plasma R'])
PF = list(data['plasma F'])
Hb = list(data['HbA1c'])
Type = le.fit_transform(list(data['Type']))
X = list(zip(Age,BSf,BSp,PR,PF,Hb))
y = list(Type)
x_train,x_test,y_train,y_test = sklearn.model_selection.train_test_split(X,y,test_size = 0.1)
# model = linear_model.LinearRegression()
model = LogisticRegression()
model.fit (x_train,y_train)
acc = model.score(x_test,y_test)
coef = model.coef_
inter = model.intercept_
prediction = model.predict(x_test)
for i in range (5):
print ('predicted ',prediction[i],'variables ',x_test[i],'actual',y_test[i])
print(acc)
print(coef,inter)
结果是--------
predicted 1 variables (2,9,14,6,10) actual 1
predicted 2 variables (33,7,8,8) actual 2
predicted 0 variables (19,4,3,2,0) actual 0
predicted 0 variables (7,15,5,3) actual 0
predicted 0 variables (16,0) actual 0
1.0
[[-0.02543341 0.3763792 -0.2116062 -1.36365511 -0.87416662 -1.8448327 ]
[ 0.00940748 -1.12894486 1.50994009 1.1101098 1.23563738 -0.2574385 ]
[ 0.01602593 0.75256566 -1.29833389 0.25354531 -0.36147076 2.1022712 ]] [ 28.79209663 -19.24933782 -9.54275881]
C:\Users\nk\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:764: ConvergenceWarning: lbfgs Failed to converge (status=1):
STOP: TOTAL NO. of IteraTIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)
解决方法
coef_: ndarray of shape (1,n_features) or (n_classes,n_features)
(与intercept_相同)
您有3个课程。
在此最小示例中,还具有3个类:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
X,y = load_iris(return_X_y=True)
clf = LogisticRegression(random_state=0).fit(X,y)
clf.predict(X[:2,:])
clf.predict_proba(X[:2,:])
clf.score(X,y)
set(y) # >>>{0,1,2} --> there are 3 classes
clf.coef_ # >>> array([[-0.41874027,0.96699274,-2.52102832,-1.08416599],# [ 0.53123044,-0.31473365,-0.20002395,-0.94866082],# [-0.11249017,-0.65225909,2.72105226,2.03282681]])
clf.coef_.shape # >>> (3,4)
clf.intercept_ # >>> array([ 9.84028024,2.21683511,-12.05711535])
您需要能够辨别样本是否属于哪个类别。无论您要测试哪个类,结果都将在0或1之间。
例如。在coef_
的第一行中,检查它是否属于1类...等。