LightGBM返回负概率

问题描述

我一直在研究LightGBM预测模型,用于检查事物的可能性。 我使用最小最大缩放器缩放数据,保存并在缩放后的数据上训练了模型。

然后实时从以前加载我的模型和缩放器,并尝试预测新条目的可能性。 由于某种原因,我得到了一个负概率

这是代码

# Model Vars
learning_rate = 0.005
boosting_type = 'gbdt'
objective = 'rmse'
metric = ['balanced_accuracy_score','rmse','auc']
sub_feature = 0.5721
num_leaves = 3000
min_data = 500
max_depth = 22
max_bin = 12000


def createLGBM():
   # Read dataset
   dataset = pd.read_csv('C:\FullDatasetNoArson.csv')
   x = dataset.values  # returns a numpy array

   # normalize Data
   min_max_scaler = preprocessing.MinMaxScaler()
   x_scaled = min_max_scaler.fit_transform(x)
   
   # Save scaler
   joblib.dump(min_max_scaler,"Saved_Scaler")

   dataset = pd.DataFrame(x_scaled)
   X = dataset.iloc[:,0:8].values
   y = dataset.iloc[:,8].values
   
   # Initiate LGBM
   x_train,x_test,y_train,y_test = train_test_split(X,y,test_size=0.25,random_state=0)
   d_train = lgb.Dataset(x_train,label=y_train)
   params = {'task': 'train','learning_rate': learning_rate,'boosting_type': boosting_type,'objective': objective,'metric': metric,'sub_feature': sub_feature,'num_leaves': num_leaves,'min_data': min_data,'max_depth': max_depth,'max_bin': max_bin,'num_threads': 7,'is_training_metric': True,'verbose': 1}

   clf = lgb.train(params,d_train,2000,keep_training_booster=True)

   # Prediction
   y_pred = clf.predict(x_test)

   # convert into binary values
   for i in range(0,len(y_pred)):
       if y_pred[i] >= .5:  # setting threshold to .5
           y_pred[i] = 1
       else:
           y_pred[i] = 0

   # Accuracy
   accuracy = accuracy_score(y_pred,y_test)

   clf.save_model("lgb-model_" + str(accuracy) + ".txt")

   return "lgb-model_" + str(accuracy) + ".txt"


def predict(data):
   data = np.asarray(data)
   print(data)
   min_max_scaler = joblib.load("Saved_Scaler")
   data = min_max_scaler.transform(data)
   data = data[:,0:8]
   model = lgb.Booster(model_file='lgb-model_0.7906763418553157.txt')
   print(data)
   pred = model.predict(data)
   return pred.tolist()

解决方法

因此,在阅读了有关该主题的更多信息后,我意识到使用决策树回归模型可以返回负值,这很好。

但是,如果有人发现我的错误,那就是我尝试使用回归来计算某件事发生的概率,猜测此类事情发生的概率的正确方法是使用二进制对数损失分类(或Log-损失回归)。