如何使用R中的包data.table计算汇总统计信息标准误差以及上下置信区间

问题描述

问题

我有一个名为 FID 的数据框(如下所示),并且我正在尝试使用 data.table 包来汇总我的数据。我想通过以下方式总结我的数据:-

所需的汇总数据框

  1. 3年内每月进行FID的总次数
  2. 3年中每月FID的平均频率
  3. 3年内每月FID的标准偏差
  4. 3年内每月FID的标准错误
  5. 3年内每月降低置信度
  6. 3年内每月的最高置信度

我可以单独执行其中一些过程(请参见下文),但是我想将所需数据框列表(上文)中上述所有信息组合到一个表中。

我已经在Stack Overflow页面和其他 data.table 教程中进行了详尽的阅读,但是我找不到有关如何计算标准误差以及使用数据包数据的上下置信区间的任何信息。表。有谁知道该怎么做?

  ##Summary Statistics table of FID per month over 3 years

   library(data.table)

  ##Produce a data.table object
    FID.Table<-data.table(FID)

   ##R-code
   Mean.FID<-FID_Table[,.(FID.Freq=sum(FID),mean = mean(FID),sd=sd(FID),median=median(FID)),by = .(Month)]

 ###Summary Statistics table 
       Month FID.Freq      mean        sd median
 1:   January      165 55.000000 10.535654     56
 2:  February      182 60.666667 29.737743     65
 3:     march      179 59.666667 33.291641     43
 4:     April      104 34.666667 16.862186     27
 5:       May      124 41.333333 49.571497     20
 6:      June       10  3.333333  5.773503      0
 7:      July       15  5.000000  4.358899      7
 8:    August      133 44.333333 21.007935     45
 9: September       97 32.333333 21.548395     34
10:   October       82 27.333333 13.051181     26
11:  November       75 25.000000 19.000000     25
12:  December      102 34.000000  4.582576     33
    

数据框:FID

structure(list(Year = c(2015L,2015L,2016L,2017L,2017L),Month = structure(c(5L,4L,8L,1L,9L,7L,6L,2L,12L,11L,10L,3L,5L,3L),.Label = c("April","August","December","February","January","July","June","march","May","November","October","September"),class = "factor"),FID = c(65L,88L,43L,54L,98L,0L,23L,15L,33L,56L,29L,65L,53L,41L,25L,30L,44L,38L,27L,20L,45L,34L,26L,39L)),class = "data.frame",row.names = c(NA,-36L))

解决方法

假设您希望每个月的行数成为标准错误的分母(即.N),则可以使用它来创建95%ci(即* 1.96) 。或者,如果缺少数据,则可能要使用sum(!is.na(FID.Freq))而不是.N。简而言之,只需计算每个月的标准误,然后再将ci添加为列:

library(data.table)

setDT(FID)

Mean.FID = FID[,.(FID.Freq=sum(FID),mean = mean(FID),sd=sd(FID),se=sd(FID) / sqrt(.N),median=median(FID)),by = Month]

Mean.FID[,`:=`(lo_ci = mean - se * 1.96,up_ci = mean + se * 1.96)]

Mean.FID
        Month FID.Freq      mean        sd        se median       lo_ci     up_ci
 1:   January      165 55.000000 10.535654  6.082763     56  43.0777854 66.922215
 2:  February      182 60.666667 29.737743 17.169094     65  27.0152431 94.318090
 3:     March      179 59.666667 33.291641 19.220938     43  21.9936289 97.339704
 4:     April      104 34.666667 16.862186  9.735388     27  15.5853064 53.748027
 5:       May      124 41.333333 49.571497 28.620117     20 -14.7620965 97.428763
 6:      June       10  3.333333  5.773503  3.333333      0  -3.2000000  9.866667
 7:      July       15  5.000000  4.358899  2.516611      7   0.0674415  9.932558
 8:    August      133 44.333333 21.007935 12.128937     45  20.5606169 68.106050
 9: September       97 32.333333 21.548395 12.440972     34   7.9490287 56.717638
10:   October       82 27.333333 13.051181  7.535103     26  12.5645314 42.102135
11:  November       75 25.000000 19.000000 10.969655     25   3.4994760 46.500524
12:  December      102 34.000000  4.582576  2.645751     33  28.8143274 39.185673