glmnet的默认变量迹线图是否使用标准化系数?

问题描述

glmnet的默认变量迹线图具有标准化系数吗?我怎么知道?如果没有,我该怎么做?

set.seed(123)

lambdas <- 10^seq(3,-2,by = -.1)

cv.ridge <- cv.glmnet(x_train_r,y_train_r,alpha = 0,family = "binomial",lambda= lambdas)

plot(cv.ridge$glmnet.fit,"lambda",label=TRUE)

具有系数的迹线图。他们标准化了吗?

Trace plot with the coefficients. Are they standardized ?

解决方法

系数未标准化,请参见this post as well。您可以通过对非标准化预测变量的系数进行交叉乘法来轻松检查:

library(mlbench)
data(Sonar)
X=as.matrix(Sonar[,1:10])
y=as.numeric(Sonar$Class)-1
fit = cv.glmnet(X,y,alpha = 0,family = "binomial")

比例尺太大,无法标准化:

plot(fit$glmnet.fit,"lambda")

enter image description here

我们可以仔细检查:

Co = coef(fit,s="lambda.1se")
our_pred = cbind(1,X) %*% as.matrix(Co)
y_pred = predict(fit,X,lambda="lambda.1se")

table(our_pred == y_pred)

TRUE 
 208

因此,系数被转换回原始比例。要使标准化系数仅用于可视化,我们可以除以每个预测变量的标准偏差,但是要获得比例系数的完整信息,请参见the answer by @MatthewDury

#column standard deviation
col_SD = apply(X,2,sd)

Co = fit$glmnet.fit$beta
Co = sweep(fit$glmnet.fit$beta,1,col_SD,"/")
#cols = RColorBrewer::brewer.pal(nrow(Co),"Set3")
l = fit$glmnet.fit$lambda
names(l) = colnames(Co)

library(ggplot2)
library(reshape2)
library(ggrepel)

df = melt(as.matrix(Co))
df$lambda = l[as.character(df$Var2)]

ggplot(df,aes(x=lambda,y=value,col=Var1)) + 
geom_line() + scale_x_log10() +
geom_label_repel(data=subset(df,lambda==min(l)),label=Var1),nudge_x=-0.1,show.legend=FALSE)

enter image description here

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...