问题描述
我有一个 X x Y 网格,如果满足某个条件,则单元格包含 1,如果不满足,则包含 0。现在我想识别网格中至少有 N 个包含 1 的连续单元格的特征。连续单元格可以并排相邻,也可以对角相邻。我做了一张图片来说明问题(见链接),N = 5。为清楚起见,我省略了标记 0,它们位于未标记的单元格中。红色 1 属于我要识别的特征,黑色 1 不属于。所需的结果将如图所示,但所有黑色 1 变为 0。我使用 R,因此使用该语言的解决方案将不胜感激,但我很乐意接受其他人。我在 R 库(例如 rgeos)中找不到任何具体的东西,但也许我遗漏了一些东西。感谢任何帮助,谢谢!
这是一个可重现的小例子
input.mat <- structure(c(1L,1L,0L,1L),.Dim = c(15L,15L),.Dimnames = list(NULL,NULL))
input.mat
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
[1,] 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0
[2,] 1 1 0 0 1 1 1 0 0 1 0 0 0 1 0
[3,] 0 0 1 0 0 0 0 0 0 1 1 0 1 0 1
[4,] 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0
[5,] 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
[6,] 1 0 0 0 0 0 0 0 0 0 1 0 1 1 0
[7,] 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0
[8,] 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
[9,] 1 0 0 0 0 1 0 1 0 0 0 1 1 1 0
[10,] 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0
[11,] 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1
[12,] 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0
[13,] 0 0 1 0 1 0 0 0 1 0 0 0 0 0 1
[14,] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1
[15,] 1 1 1 1 1 0 0 0 1 1 0 0 0 0 1
output.mat <- structure(c(1L,0L),NULL))
output.mat
[,] 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0
[3,] 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1
[4,] 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0
[10,] 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0
[14,] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
[15,] 1 1 1 1 1 0 0 0 1 1 0 0 0 0 0
由 reprex package (v2.0.0) 于 2021 年 5 月 27 日创建
解决方法
这是二维点聚类的基本 R 代码
# compute distance from point `x` to point set `S`
fdist <- function(x,S) {
if (length(S) == 0) {
return(0)
}
v <- x - S
pmax(abs(Re(v)),abs(Im(v)))
}
# assign groups based on distance
fgrp <- function(x,clst) {
for (k in seq_along(clst)) {
if (any(fdist(x,clst[[k]]) < 2)) {
clst[[k]] <- c(clst[[k]],x)
return(clst)
}
}
}
# use complex number represent 2D points
p <- c(which(input.mat == 1,arr.ind = TRUE) %*% c(1,1i))
# initialize cluster list
clst <- list()
while (length(p) > 0) {
idxrm <- c()
for (k in seq_along(p)) {
clst_new <- fgrp(p[k],clst)
if (sum(lengths(clst_new)) > sum(lengths(clst))) {
idxrm <- c(idxrm,k)
clst <- clst_new
}
}
if (length(idxrm) == 0) {
clst <- c(clst,list(p[1]))
} else {
p <- p[-idxrm]
}
}
# keep points that follows the contiguous pattern
N <- 5
Z <- do.call(
c,Filter(
function(x) length(x) >= N,Map(
unique,clst
)
)
)
# produce output matrix
output.mat <- input.mat * 0
output.mat[cbind(Re(Z),Im(Z))] <- 1
你会得到
> output.mat
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
[1,] 1 1 0 0 0 0 0 0 0 0 0 0 1
[2,] 1 1 0 0 0 0 0 0 0 0 0 0 0
[3,] 0 0 1 0 0 0 0 0 0 0 0 0 1
[4,] 0 0 0 1 0 0 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 0 0 0 0 0 1 0
[6,] 1 0 0 0 0 0 0 0 0 0 1 0 1
[7,] 1 1 0 0 0 0 0 0 0 0 0 1 0
[8,] 1 1 0 0 0 0 0 0 0 0 0 0 0
[9,] 1 0 0 0 0 0 0 0 0 0 0 1 1
[10,] 0 0 0 0 0 0 0 0 0 0 0 1 1
[11,] 0 0 1 0 1 0 0 0 0 0 0 0 0
[12,] 0 0 0 1 0 0 0 0 0 1 0 0 0
[13,] 0 0 1 0 1 0 0 0 1 0 0 0 0
[14,] 0 0 0 0 0 0 0 0 1 0 0 0 0
[15,] 1 1 1 1 1 0 0 0 1 1 0 0 0
[,14] [,15]
[1,] 0 0
[2,] 1 0
[3,] 0 1
[4,] 1 0
[5,] 0 0
[6,] 1 0
[7,] 0 0
[8,] 0 0
[9,] 1 0
[10,] 1 0
[11,] 0 1
[12,] 0 0
[13,] 0 0
[14,] 0 0
[15,] 0 0
想法
- 找到
1
的位置,即行列索引 - 对于每个点位置,我们检查它是否属于任何现有集群。如果是,则将该点分配给该集群。否则,用这个点创建一个新的集群
- 当所有点都被检查时,过程终止。