问题描述
我尝试按照示例https://rpubs.com/JanpuHou/359286绘制svm的ROC,但是我在最后一行代码中始终遇到错误:这是数据集的开头 头(数据)
growth LogSales Age LogTA CoAge CoAge2 Reg DigMkt
1 No 15.87283 45 15.32751 8 64 0 1
2 Yes 16.05044 44 15.27176 7 49 0 1
3 Yes 15.36307 32 15.20180 3 9 1 0
4 Yes 15.09644 31 14.97866 2 4 1 0
5 Yes 16.90655 59 16.58810 11 121 1 0
6 Yes 16.45457 58 15.95558 10 100 1 0
我的代码:
split = sample.split(data,SplitRatio = 0.70)
training = subset(data,split==T)
testing = subset(data,split==F)
###Making growth last to allow for variable importnce
###Fitting model
svm_Lin = svm(growth~.,data = training,kernel = "linear",cost =1,scale = T,probability = TRUE)
##Prediction
pred = predict(svm_Lin,testing)
table(predict = pred,truth = testing$growth)
confusionMatrix(table(pred,testing$growth))
###ROC Curve
library(ROCR)
p<- predict(svm_Lin,testing,type="decision")
pr<-prediction(p,testing$growth)
pref <- performance(pr,"tpr","fpr")
plot(pref)
当我运行以下行:pr<-prediction(p,testing$growth)
时,我收到以下错误消息
Error: Format of predictions is invalid. It Couldn't be coerced to a list.
感谢您提供任何解决方法的帮助。
解决方法
我建议采用下一种方法。您遇到的主要问题是,来自svm的预测属于类型因子,因此ROCR
函数无法对其进行比较。我将对您的问题进行一些修改。您拥有二进制数据,因此可以将目标变量作为两个级别的因数使用。然后,在ROCR
部分中,您必须将因子转换为数值。这样,您的代码就会起作用。
此外,来自caTools
包的采样方法正在产生NA
。因此,我使用rsample
包添加了类似的方法。这里是代码。
library(ROCR)
library(e1071)
library(rsample)
#Data
data <- structure(list(growth = c("Yes","Yes","No","No"),LogSales = c(15.36307,15.36307,16.05044,16.45457,16.90655,15.87283,15.87283),Age = c(32L,32L,44L,58L,59L,45L,45L),LogTA = c(15.2018,15.2018,15.27176,15.95558,16.5881,15.32751,15.32751),CoAge = c(3L,3L,7L,10L,11L,8L,8L),CoAge2 = c(9L,9L,49L,100L,121L,64L,64L),Reg = c(1L,1L,0L,0L),DigMkt = c(0L,1L
)),row.names = c("3","3.1","2","6","5","2.1","2.2","6.1","2.3","5.1","1","1.1","5.2","6.2","5.3","5.4","2.4","2.5","1.2","1.3"),class = "data.frame")
现在,我们格式化目标变量:
#Format objective var to have a factor
data$growth[data$growth=='No']<-0
data$growth[data$growth=='Yes']<-1
data$growth <- factor(data$growth,levels = c(0,1),labels = c(0,1))
rsample
中的拆分方法:
#Split
split <- initial_split(data,prop = 0.7,strata = 'growth')
#Create training and test set
training <- training(split)
testing <- testing(split)
我们拟合了模型:
###Fitting model
svm_Lin = svm(growth~.,data = training,kernel = "linear",cost =1,scale = T,probability = TRUE,type="C-classification")
我们对测试集进行预测:
###Predict for ROC Curve
testing$p <- predict(svm_Lin,testing,type="response")
现在,我们格式化输出变量并准备使用ROCR
函数:
由于因子从1开始,数字1的类的值为2,数字0的类的值为1。可以通过将其变为数字并减去1来转换为0-1。
#Format variables
testing$growth <- as.numeric(testing$growth)-1
testing$p <- as.numeric(testing$p)-1
最后,我们建立ROC曲线:
#Build ROCR scheme
pr<-prediction(testing$p,testing$growth)
pref <- performance(pr,"tpr","fpr")
plot(pref)
输出: