问题描述
library(tidyverse)
library(caret)
library(glmnet)
creditdata <- read_excel("R bestanden/creditdata.xlsx")
df <- as.data.frame(creditdata)
df <- na.omit(df)
df$married <- as.factor(df$married)
df$graduate_school <- as.factor(df$graduate_school)
df$high_school <- as.factor(df$high_school)
df$default_payment_next_month <- as.factor(df$default_payment_next_month)
df$sex <- as.factor(df$sex)
df$single <- as.factor(df$single)
df$university <- as.factor(df$university)
set.seed(123)
training.samples <- df$default_payment_next_month %>%
createDataPartition(p = 0.8,list = FALSE)
train.data <- df[training.samples,]
test.data <- df[-training.samples,]
x <- model.matrix(default_payment_next_month~.,train.data)[,-1]
y <- ifelse(train.data$default_payment_next_month == 1,1,0)
cv.lasso <- cv.glmnet(x,y,alpha = 1,family = "binomial")
lasso.model <- glmnet(x,family = "binomial",lambda = cv.lasso$lambda.1se)
x.test <- model.matrix(default_payment_next_month ~.,test.data)[,-1]
probabilities <- lasso.model %>% predict(newx = x.test)
predicted.classes <- ifelse(probabilities > 0.5,"1","0")
observed.classes <- test.data$default_payment_next_month
mean(predicted.classes == observed.classes)
大家好,
我是 R 新手,我一直在尝试使用本网站 http://www.sthda.com/english/articles/36-classification-methods-essentials/149-penalized-logistic-regression-essentials-in-r-ridge-lasso-and-elastic-net/ 上的确切代码来执行逻辑岭回归。 我的目标是预测客户是否有信用卡违约,我们有一个包含因子变量和数值变量的数据集。问题是我的大多数概率都是负的并且小于 -1,所以 -2.6、-1.4 等等。有谁知道这里出了什么问题?
预先感谢您的帮助!
解决方法
就像 glm
一样,默认情况下 predict
的 glmnet
函数返回 predictions on the scale of the link function,这不是概率。
要获得预测概率,请将 type = "response"
添加到 predict
调用:
probabilities <- lasso.model %>% predict(newx = x.test,type = "response")