提取不同组的密度估计

问题描述

我有一个数据帧(df),如下所示:

> summary(df)
   Occurence        Group          
 Min.   :0.001   Length:7990       
 1st Qu.:0.028   Class :character  
 Median :0.160   Mode  :character  
 Mean   :0.195                     
 3rd Qu.:0.307                     
 Max.   :0.600                     
 NA's   :5473

> unique(df$Group)
 [1] "fa20,0"   "sa20,0"   "fa05,0"   "sa10,0"   "flatsa,0" "flatfa,0" "fa10,0"   "sa05,1" "fa10,1"   "fa05,1"   "sa20,1"   "flatfa,1" "fa20,1"   "sa10,1"   "sa05,1" 

我试图通过具有density()函数的每个唯一组来获得发生的内核密度估计。我一次可以做一组:

> flatsa <- density(c(as.numeric(ag04_pattern_long$Occurence[ag04_pattern_long$Group == "flatsa,0"])),na.rm=T)

> flatsa_df2 <- enframe(flatsa$x,value = "X") %>%
+     add_column(Y=flatsa$y) %>%
+     add_column(Group = "flatsa,0") %>%
+     select(-name)

哪个将为flatsa_df2生成输出

# A tibble: 512 x 3
        X       Y Group   
    <dbl>   <dbl> <chr>   
 1 -0.168 0.00317 flatsa,0
 2 -0.166 0.00351 flatsa,0
 3 -0.164 0.00387 flatsa,0
 4 -0.162 0.00427 flatsa,0
 5 -0.161 0.00471 flatsa,0
 6 -0.159 0.00519 flatsa,0
 7 -0.157 0.00570 flatsa,0
 8 -0.155 0.00628 flatsa,0
 9 -0.153 0.00689 flatsa,0
10 -0.151 0.00755 flatsa,0
# ... with 502 more rows

如何一次对df $ Group中的所有16个唯一元素执行此操作?理想情况下,它们都将合并为一个数据帧。我尝试过:

dens_table <- setDT(ag04_pattern_long)[,.(dens=density(ag04_pattern_long$Occurence,na.rm=T)),by = Group]

for(i in length(unique(ag04_pattern_long$Group))){
  dens_table <- density(c(as.numeric(ag04_pattern_long$Occurence[i],na.rm=T)))
}

但是这些都不产生正确的输出。循环给我一个错误,说它需要“至少2点才能选择带宽”。我认为这表明并未考虑每个唯一(df $ Group)的所有df $ Occurence值。

帮助!

解决方法

这是一种base R方法:

occur_list = split(df$Occurrence,df$Group)
est_list = lapply(df_list,function(x) {
  data.frame(density(x,na.rm=T)[c("x","y")])
})
results = do.call(rbind,est_list)
results$Group = rep(names(occur_list),each = sapply(est_list,nrow))

我们还可以使用for循环来适应您的尝试:

results = list()
for(i in unique(ag04_pattern_long$Group)){
  results[[i]] <- data.frame(density(ag04_pattern_long$Occurence[ag0f_pattern_long$Group == i],na.rm = T)[c("x","y")])
  results[[i]]$Group = i
}
results = do.call(rbind,results)

或使用dplyr

df %>% 
  nest_by(Group) %>%
  mutate(dens = list(data.frame(density(data$Occurrence)[c("x","y")]))) %>%
  select(-data) %>%
  unnest(cols = dens)

在所有情况下,我都从循环内删除了c(as.numeric())。循环之前,请确保整个Occurrence列都是数字-这比在循环内转换每一列要好。