使用 spatstat 进行点模式分类：我做错了什么？

问题描述

我正在尝试使用 spatstat 将二元点模式分类为组。这些图案来源于癌症淋巴结的整个幻灯片图像。我训练了一个神经网络来识别三种类型的细胞（癌症“LP”、免疫细胞“bcell”和所有其他细胞）。我不想分析所有其他细胞，而是使用它们来构建淋巴结形状的多边形窗口。因此，要分析的模式是多边形窗口中的免疫细胞和癌细胞。每个模式可以有几个 10k 癌细胞和最多 2mio 免疫细胞。图案属于“小世界模型”类型，因为点不可能位于窗外。

我的分类应该基于癌细胞相对于免疫细胞的位置。例如。大多数癌细胞都位于免疫细胞的“孤岛”上，但在某些情况下，癌细胞（似乎）是均匀分散的，只有少数免疫细胞。此外，整个节点的模式并不总是一致的。由于我对空间统计比较陌生，因此我开发了一种简单粗暴的方法来对模式进行分类。简而言之：

我用 sigma=80 计算了免疫细胞的核密度，因为这对我来说看起来“很好”。 Den<-density(split(cells)$"bcell",sigma=80,window= cells$window)（我应该使用例如 sigma=bw.scott 吗？）
然后我通过将密度范围划分为 3 个部分来创建一个镶嵌图像（在这里，我再次尝试了中断以获得一些“好看的结果”）。

rangesDenMax<-2*range(Den)[2]/3
rangesDenMin<-range(Den)[2]/3
map.breaks<-c(-Inf,rangesDenMin,rangesDenMax,Inf)
map.cuts <- cut(Den,breaks = map.breaks,labels = c("Low B-cell density","Medium B-cell density","High B-cell density"))
map.quartile <- tess(image = map.cuts,window=cells$window)
tessImage<-map.quartile

以下是具有癌细胞叠加层（白点）的曲面细分图的一些示例。左边的淋巴结有一个典型的均匀分布的免疫细胞“岛”，而右边的淋巴结只有少数免疫细胞和癌细胞的密集点，不限于这些点：

heat map: immune cell kernel density,white dots: cancer cells

然后我测量了一些愚蠢的变量，这应该可以让我了解癌细胞如何分布在镶嵌图块中（计算代码很简单，所以我只发布了对变量的描述）：

LPlwB<-c() # proportion of cancer cells in low-b-cell-area 
LPmdB<-c() # proportion of cancer cells in medium-b-cell-area 
LPhiB<-c() # proportion of cancer cells in high-b-cell-area
AlwB<-c()  # proportion of the low-b-cell area
AmdB<-c()  # proportion of the medium-b-cell area
AhiB<-c()  # proportion of the high-b-cell area
LPm1<-c()  # mean distance to the 1st neighbour
LPm2<-c()  # mean distance to the 2nd neighbour
LPm3<-c()  # mean distance to the 3d neighbour
LPsd1<-c() # standard deviation of the mean distance to the 1st neighbour
LPsd2<-c() # standard deviation of the mean distance to the 2nd neighbour
LPsd3<-c() # standard deviation of the mean distance to the 3d neighbour
meanQ<-c() # mean quadratcount (I visually chose the quadrat size to be not too large and not too small)
sdevQ<-c() # standard deviation of the mean quadratcount
hiSAT<-c() # realised cancer cells saturation in high b-cell-area (number of cells observed divided by a number of cells,which Could be fitted into the area considering the observed min distance between the cells)
mdsAT<-c() # realised cancer cells saturation in medium b-cell-area 
lwSAT<-c() # realised cancer cells saturation in low b-cell-area 
ll<-c() # Proportion LP neighbours of LP (contingency table count divided by total points) 
lb<-c() # Proportion b-cell neighbours of LP
bl<-c() # Proportion b-cell neighbours of b-cells
bb<-c() # Proportion LP neighbours of b-cells

我对变量进行了 z 缩放，在 PCA 图中检查了它们（向量指向不同的方向，就像海胆的针一样）并执行了层次聚类分析。我通过计算 fviz_nbclust(scaled_variables,hcut,method = "silhouette") 来选择 k。在将树状图划分为 k 个簇并检查簇稳定性后，我最终得到了我的组，这似乎是有道理的，因为“孤岛”的案例与“更分散”的案例分开了。

然而，考虑到 spatstat 包的可能性，我强烈地想用智能手机在墙上钉钉子。

因为我希望在大多数情况下两种细胞类型都聚集在一起，所以我对不均匀性进行了测试（又名 Quadrat 测试）：quadrat.test(split(cells)$“LP“)，这强烈表明在所有情况下都存在不均匀性。我还进行了聚类测试，表明所有模式（甚至看似分散的模式）hopskel.test(split(cells)$“LP“,method = "MonteCarlo",nsim = 19,alternative="clustered") 中癌细胞的聚类。这些测试对我的非同质小世界模式有效吗？

然后我试图理解建模的意义，但收效甚微。

unamrkPat<-unmark(split(cells)$"LP")
covarIm<-density(split(cells)$"bcell",sigma=80)
m1 <-kppm(unamrkPat ~ covarIm)
sim1<-simulate(m1,nsim=19)
env1 <-envelope(split(cells)$"LP",Lest,nsim=19,simulate=sim1,correction="none")
plot(env1)

然而，我无法理解如何使用模拟和信封来对我的模式进行分类，这是我的主要目标。

有人可以帮助我理解，我对镶嵌图像、变量提取和层次聚类的幼稚方法是否有效分析？我应该/可以使用哪些替代方法？有没有一种简单的方法可以使用模型对我的模式进行分类？很抱歉这个很长的问题。 如果有人只能回答其中的一部分，我将不胜感激！

解决方法

您似乎正在尝试量化癌细胞相对于免疫细胞的定位方式。你可以通过类似的方式来做到这一点

Cancer <- split(cells)[["LP"]]
Immune <- split(cells)[["bcell"]]
Dimmune <- density(Immune,sigma=80)
f <- rhohat(Cancer,Dimmune)
plot(f)

那么 f 是一个函数，它表示癌细胞的强度（每单位面积的数量）作为免疫细胞密度的函数。该图在纵轴上显示了癌细胞的密度，在横轴上显示了免疫细胞的密度。

如果这个函数的图形是平坦的，则意味着癌细胞没有关注免疫细胞的密度。如果图表急剧下降，则意味着癌细胞倾向于避开免疫细胞。

我建议您首先查看一些示例数据集的 f 绘图，以确定 f 是否有能力区分您认为应该归类为不同的空间排列。如果是这样，那么您可以使用 as.data.frame 提取 f 的值，然后使用经典判别分析（等）将幻灯片图像分类。

您可以使用免疫细胞的任何其他摘要来代替 density(Immune)。例如，D <- distfun(Immune) 会给你到最近的免疫细胞的距离，然后 f 会根据到最近的免疫细胞的距离来计算癌细胞的密度。等等。

@Adrian Baddeley：谢谢阿德里安！这真的很有帮助！我选择了几个“典型”案例，并使用我的变量提取和建议的“资源选择功能”方法对其进行了分析。我选择了 density(Immune) 协变量。将所有函数保存到列表中后，我以一种有点笨拙的方式提取了 f 值：

 dflist<-list()
  for(i in 1:12){
  dflist[[i]]<-t(as.data.frame(flist[[i]])) #flist contains rho-functions
  }
  varnames<-c()
  for(j in 1:nrow( dflist[[1]])){
  varnames<-append(varnames,paste0(row.names(dflist[[1]])[j],colnames(dflist[[1]])))
  }
  df<-as.data.frame(matrix(ncol=length(varnames),nrow=length(flist),dimnames=list(names(auswahl),varnames)))#auswahl contains cells-point patterns
  for(i in 1:length(auswahl)){
    for(j in 1:nrow( dflist[[1]])){
    vars<-c()
    vars<-append(vars,dflist[[i]][j,])
    df[i,]<-as.numeric(vars)
  
      }
  }

由于我不知道先验的组，我对“我的”变量和 rhohat - 派生数据进行了层次聚类，并使用 dendextend::tanglegram（左侧：“我的变量" 右边是rhohat派生的，案例g1-g2在视觉上与g3-g4不同)

令人惊讶的是，树状图是如此相似。然后我查看了你的、Ege 和 Rolf 的很棒的书，以找到如何衡量协变量影响的强度。我尝试了散点图 pairs(predict(f),Dimmune) (p.181) 和 Kolmogorov-Smirnov-Tests cdf.test(Cancer,Dimmune,test="ks")，它们很重要，但看起来不太合适：

尽管免疫细胞密度似乎不能很好地预测癌细胞，但我是否仍可以使用源自罗哈特的数据进行聚类？还是我误解了测试？

classification r r spatstat statistics