问题描述
我尝试了一个朴素贝叶斯分类器,看看我是否可以预测一个人,根据他们的年龄和估计工资,是否会购买特定的车辆。我在可视化部分得到的图看起来不太平滑和干净,白线穿过我的图。我假设图形/分辨率是问题,但我不确定。
这是数据集的一个片段
Age EstimatedSalary Purchased
19 19000 0
35 20000 0
26 43000 0
27 57000 0
19 76000 0
27 58000 0
这是代码
# Loading the data set
data <- read.csv(" *A csv sheet on people's age,salaries and whether or not they will purchase a certain vehicle* ")
data <- data[,3:5]
attach(data)
# Encoding the dependent variable
data$Purchased <- factor(data$Purchased,levels = c(0,1))
attach(data)
# Splitting the dataset
library(caTools)
set.seed(404)
split <- sample.split(Purchased,SplitRatio = 0.75)
train_set <- subset(data,split == T)
test_set <- subset(data,split == F)
# Feature scaling
train_set[-3] <- scale(train_set[-3])
test_set[-3] <- scale(test_set[-3])
# Training the model
library(e1071)
classifier <- naiveBayes(x = train_set[-3],y = train_set$Purchased)
# Predicting test results
y_pred <- predict(classifier,newdata = test_set[-3])
# Construct the confusion matrix
(cm <- table(test_set[,3],y_pred))
下面是我用来可视化结果的代码
# Visualising the results
library(ElemStatLearn)
set <- test_set
x1 <- seq(min(set[,1]) - 1,max(set[,1]) + 1,by = 0.01)
x2 <- seq(min(set[,2]) - 1,2]) + 1,by = 0.01)
grid_set <- expand.grid(x1,x2)
colnames(grid_set) <- c("Age","EstimatedSalary")
y_grid <- predict(classifier,newdata = grid_set)
plot(set[,-3],main = "Naive Bayes: Test set",xlab = "Age",ylab = "EstimatedSalary",xlim = range(x1),ylim = range(x2))
contour(x1,x2,matrix(as.numeric(y_grid),length(x1),length(x2)),add = T)
points(grid_set,pch = ".",col = ifelse(y_grid == 1,"Springgreen3","tomato"))
points(set,pch = 21,bg = ifelse(set[,3] == 1,"green4","red3"))
Naive Bayes classifier plot on the test set predictions
想知道白线在绘图上下运行的原因以及为什么它看起来不平滑?
解决方法
所以我想出了是什么给了我奇怪的线条和低质量的分辨率。将“cex = n”参数添加到图中的“points()”函数中,n = 5 解决了这个问题。
修改后的代码块
set <- test_set
x1 <- seq(min(set[,1]) - 1,max(set[,1]) + 1,by = 0.01)
x2 <- seq(min(set[,2]) - 1,2]) + 1,by = 0.01)
grid_set <- expand.grid(x1,x2)
colnames(grid_set) <- c("Age","EstimatedSalary")
y_grid <- predict(classifier,newdata = grid_set)
plot(set[,-3],main = "Naive Bayes: Test set",xlab = "Age",ylab = "EstimatedSalary",xlim = range(x1),ylim = range(x2))
contour(x1,x2,matrix(as.numeric(y_grid),length(x1),length(x2)),add = T)
points(grid_set,pch = ".",col = ifelse(y_grid == 1,"Springgreen3","tomato"),cex = 5)
points(set,pch = 21,bg = ifelse(set[,3] == 1,"green4","red3"))
修改后的代码行
points(grid_set,cex = 5)
然而,我仍然想知道这背后的原因,因为 R 中关于函数和参数的解释对我来说不是那么清楚。
非常感谢您提供的任何帮助!