问题描述
这可能是一个幼稚的问题,但没有找到解决方案。我有一个包含来自实地调查的计数数据的数据框,我想使用泊松回归预测物种丰富度。调查被分配到相同大小的网格,但在每个网格中进行的调查数量不一。所以我想包括“每个网格的调查数量”作为偏移量。问题是当我想使用栅格堆栈预测 glm 输出时,它需要一个栅格图层作为偏移变量(每个网格的调查数)。我的问题是如何将该偏移变量合并到栅格堆栈中,以便我可以生成空间预测(即,预测应该是栅格文件)。以下是我可重复的工作(使用较少的变量):
创建数据框:
bio2 <- c(12.74220,14.10092,13.82644,14.30550,15.02780,14.88224,13.98853,14.89524,15.59887,13.98664,14.75405,15.38178,14.50719,15.00427,12.77741,13.25432,12.91208,15.75312,15.36683,13.33202,12.55190,14.94755,13.52424,14.75273,14.42298,15.37897,12.02472,15.49786,14.28823,13.01982,13.60521,15.07687,14.17427,13.24491,14.84833,13.52594,13.92113,11.39738,14.31446,12.10239)
bio9 <- c(26.30980,26.52826,27.03376,23.93621,26.48416,26.05859,25.37550,25.34595,25.34056,23.37793,25.74681,22.72016,22.00458,24.37140,22.95169,24.52542,24.63087,22.86291,23.10240,23.79215,24.86875,21.40718,23.84258,21.91964,25.97682,24.97625,22.31471,19.64094,23.93386,25.87234,25.99514,17.17149,20.72802,18.22862,24.51112,24.33626,23.90822,23.43660,23.07425,20.71244)
count <- c(37,144,91,69,36,32,14,34,48,168,15,21,29,24,16,11,18,64,37,31,9,4,10,43,88,26,20,5,75,8,26)
sitesPerGrid <- c(3,3,1,2,6,7,22,5)
testdf <- data.frame(bio2,bio9,count,sitesPerGrid)
pois1 <- glm(count ~ bio2 + bio9,offset = log(sitesPerGrid),family = poisson (link = "log"),data = testdf)
空间预测:
library(raster)
bio_2 <- bio_9 <- raster(nrow=5,ncol=8,xmn=0,xmx=1,ymn=0,ymx=1)
values(bio_2) <- bio2
values(bio_9) <- bio9
predRas <- stack(bio_2,bio_9)
names(predRas) <- c("bio2","bio9")
pdPois <- raster::predict(predRas,pois1,type = "response")
#Error in model.frame.default(Terms,newdata,na.action = na.action,xlev = #object$xlevels) :
# variable lengths differ (found for 'bio9')
#In addition: Warning message:
#'newdata' had 16 rows but variables found have 40 rows
我得到 error
,因为它需要 sitesPerGrid
的栅格图层。但我不想使用 sitesPerGrid
作为预测变量。
更新
根据@robertHijmans 给出的评论和 answer,我尝试使用以下代码:
pdPois <- raster::predict(predRas,const = testdf[,"sitesPerGrid"],type = "response")
我再次收到以下错误:
Error in data.frame(...,check.names = FALSE) : arguments imply differing number of rows: 143811,40
解决方法
使用偏移变量的栅格解决了该问题。栅格是根据假设创建的。例如,如果每个网格有一个站点,或者 mean(sitesPerGrid)
或 max(sitesPerGrid)
,我想查看预测。如果我的假设是 mean(sitesPerGrid)
,那么预测的栅格将是:
# make new raster for sitesPerGrid
rasGrid <- bio2
rasGrid[,] <- mean(testdf$sitesPerGrid)
names(rasGrid) <- "sitesPerGrid"
predRas <- stack(bio_2,bio_9,rasGrid)
p <- raster::predict(predRas,pois1,type = "response")