使用 R 将四次核热图转换为大多边形

问题描述

我有欧胡岛海岸附近的点数据。其他人使用这些相同的数据创建了一个大 polygon。我相信他首先使用 heatmap 创建了一个 quartic (biweight) kernel，每个点周围半径为 1 公里，像素大小可能为 1 平方公里。他引用了 Silverman（1986 年，第 76 页，方程 4.5，我相信它指的是“统计和数据分析的密度估计”一书）。我相信他将他的 heatmap 转换为他的 polygon。我正在尝试使用 polygon 和 R 用虚假数据近似他的 Windows 10。我可以使用 kde 包中的 ks 函数来接近（见下图）。但是那个包只包含Gaussian kernels。是否可以使用 polygon 创建类似的 quartic kernel？

另一个分析实际上创建了 polygon 的两个版本。一个边界被标记为“> 1 每公里密度”；另一个边界被标记为“> 0.5 每公里密度”。我不知道他是否使用了 R、QGIS、ArcGIS 或其他东西。我无法在 polygon 中创建单个大 QGIS 并且没有 ArcGIS。

感谢您就如何创建与所示类似但使用 polygon 而不是 quartic kernel 的 Gaussian kernel 提出任何建议。如果我能提供更多信息，请告诉我。

这里是我的 CSV 和 QGIS 格式的虚假数据的链接：enter link description here（编辑：希望现在任何人都可以访问虚假数据。我以前可以，但我想其他人不能。）

1. fake_points_oahu.csv

     a. raw data

2. fake_points_oahu_utm (.shp,.dbf,.prj,.shx) 

     a. vector point layer 

3. fake_points_oahu_June11_2021.png

     a. the figure shown above

这是我的 R 代码：

setwd('C:/Users/mark_/Documents/ctmm/density_in_R/density_files_for_StackOverflow/')

library(sf) # to read shapefile
library(ks) # to use kde function

my.data <- read.csv("fake_points_oahu.csv",header = TRUE,stringsAsFactors = FALSE,na.strings = "NA")
head(my.data)

# Import shapefile
st_layers("fake_points_oahu_utm.shp")

points_utm <- st_read(dsn = "fake_points_oahu_utm.shp",layer = 'fake_points_oahu_utm')
st_crs(points_utm)
plot(points_utm)

my.matrix <- as.matrix(my.data[,2:3])
head(my.matrix)

# This uses the Guassian kernel
my_gps_hpi <- Hpi(x = my.matrix,pilot = "samse",pre = "scale")

my.fhat <- kde(x = my.matrix,compute.cont = TRUE,h = my_gps_hpi,xmin = c(min(my.data$longitude),min(my.data$latitude)),xmax = c(max(my.data$longitude),max(my.data$latitude)),bgridsize = c(500,500))

my.contours <- c(96.5)

contourLevels(my.fhat,cont = my.contours)
contourSizes(my.fhat,cont = my.contours,approx = TRUE)

plot(my.data$longitude,my.data$latitude)
plot(my.fhat,lwd = 3,display = "filled.contour",add = TRUE)

png(file="fake_points_oahu_June11_2021.png")

     plot(my.data$longitude,my.data$latitude)
     plot(my.fhat,add = TRUE)

dev.off()

解决方法

您可以通过稍微修改 MASS 包中的 kde2d 函数来执行您的估计。据我所知，目前 R 中没有使用四次（双权重）核对双变量情况实现双变量 KDE 估计的包。

单变量双权重核可以通过多种方式扩展为多变量核，最简单的方法是使用乘积核，您可以对每个维度使用单变量核，然后将结果相乘。您可以找到双权重积内核 here 的数学表达式。当您将此内核并入 kde2d 包中的 MASS 密度估计器时，它看起来如下

kde_biweight_kernel <- function(x,y,bw_x,bw_y,xrange,yrange){
  # This function is based on the kde2d function from 
  # the MASS package. The only difference is that the Gaussian
  # kernel is substituted with a biweight product kernel
  
  # product kernel:
  biweight_kernel <- function(u){
    mask = abs(u) > 1
    kernel_val = (15/16)*((1-u^2)^2)
    kernel_val[mask] = 0
    return(kernel_val)
  }
  
  lims = c(xrange,yrange)
  n = 500
  nx <- length(x)
  n <- rep(n,length.out = 2L)
  # get grid on which we want to estimate the density
  gx <- seq.int(lims[1L],lims[2L],length.out = n[1L])
  gy <- seq.int(lims[3L],lims[4L],length.out = n[2L])
  
  # inputs to kernel
  ax <- outer(gx,x,"-" )/bw_x
  ay <- outer(gy,"-" )/bw_y
  
  # evaluate and multiply kernel results along both axes
  res = tcrossprod(biweight_kernel(ax),biweight_kernel(ay))/(nx * bw_x * bw_y)
  return(list(x = gx,y = gy,z = res))
}

使用 kde_biweight_kernel 函数，您可以按如下方式计算所需的密度

library(MASS)
library(birk)
library(kedd)
library(sf)
library(ks)


# load data
my.data <- read.csv("fake_points_oahu.csv",header = TRUE,stringsAsFactors = FALSE,na.strings = "NA")
# Import shapefile
st_layers("fake_points_oahu_utm.shp")
points_utm <- st_read(dsn = "fake_points_oahu_utm.shp",layer = 'fake_points_oahu_utm')

x = my.data$longitude
y = my.data$latitude

# determine bandwidth for biweight kernel along both axes
bw_x = h.amise(x,deriv.order = 0,kernel = "biweight")$h
bw_y = h.amise(y,kernel = "biweight")$h

# get ranges in which you want to estimate density
xrange = c(min(my.data$longitude),max(my.data$longitude))
yrange = c(min(my.data$latitude),max(my.data$latitude))

# get 2d density estimate with quartic (biweight) kernel
result = kde_biweight_kernel(x,yrange)

请注意，带宽是专门为双权重内核情况计算的。生成的密度对象与 ks::kde 对象略有不同。例如，它还没有轮廓级别。我们可以通过使用 kde2dQuantile 包

中的 rmngb 函数的稍微修改版本计算分位数来获得轮廓级别

# get quantiles of interest:
kde2dQuantile <- function(d,X,Y,probs = .05) {
  xInd <- sapply(X,function(x) which.closest(d$x,x))
  yInd <- sapply(Y,function(x) which.closest(d$y,x))
  zValues <- d$z[cbind(xInd,yInd)]
  quantile(zValues,probs=probs)
}
# get quantiles
quantiles = kde2dQuantile(result,seq(0,1,by=0.001))

根据您的问题，我不确定您对哪个分位数感兴趣，所以我只选择了 1% 分位数。为了能够以与问题相同的方式绘制数据，我们必须以与 kde 类中的对象相同的方式格式化密度结果：

# to make the kde estimate compatible with the other density estimates
# from the ks package,the result can be converted to a named list.
# -> create ks::KDE object:
axes = matrix(c(result$x,result$y),ncol = 2)
colnames(axes) = c('longitude','latitude')

my.fhat_biweight = list('x' = axes,'eval.points' = list(result$x,'estimate' = result['z']$z,'gridtype' = 'linear','gridded' = TRUE,'binned' = TRUE,'names' = c("longitude","latitude" ))

# add quantile to ks::KDE object
my.fhat_biweight$cont = quantiles

# change class (make sure ks package is loaded for this)
class(my.fhat_biweight) <- "kde"

最后在数据上绘制双权重核密度

plot(my.data$longitude,my.data$latitude)
plot(my.fhat_biweight,lwd = 3,display = "filled.contour",cont = cont=c(96.5),add = TRUE)

这个输出：

density-plot heatmap kernel-density polygon r r