如何在R中创建汇总的人口统计表

问题描述

我从阿尔茨海氏病患者队列中获得了这些数据。我想创建一个汇总表(或列联表)以显示该表中的所有信息。这就是我希望在这个队列中看到的:男性和女性多少,平均发病年龄,上次访视的平均年龄,死亡的平均年龄,载脂蛋白4any的样本数(IID)。在R中创建此类表格的方法应该是什么?

dat <- structure(list(IID = structure(1:10,.Names = c("1","2","3","4","5","6","7","8","9","10"),.Label = c("NACC000875","NACC003779","NACC006805","NACC008215","NACC010067","NACC010592","NACC011413","NACC015383","NACC017476","NACC017538"),class = "factor"),cohort = structure(c(`1` = 1L,`2` = 1L,`3` = 1L,`4` = 1L,`5` = 1L,`6` = 1L,`7` = 1L,`8` = 1L,`9` = 1L,`10` = 1L
    ),.Label = "ADC8_AA",sex = structure(c(`1` = 2L,`2` = 2L,`3` = 2L,`4` = 2L,`5` = 2L,`8` = 2L,`9` = 2L,`10` = 2L),.Label = c("1","2"),status = structure(c(`1` = 1L,`7` = 2L,`10` = 2L
    ),Race = structure(c(`1` = 1L,`10` = 1L),.Label = "2",Ethnicity = structure(c(`1` = 1L,.Label = "0",age_onset = structure(c(NA,NA,1L,4L,2L,3L),.Label = c(" 63"," 67"," 71"," 79","888"),age_last_visit = structure(c(`1` = 6L,`2` = 4L,`3` = 3L,`7` = 8L,`8` = 7L,`10` = 5L),.Label = c("70","71","74","77","78","82","86","89"),age_death = structure(c(NA,3L,NA),.Label = c(" 72"," 88"," 90",apoe4any = structure(c(`1` = 1L,.Label = c("0","1"),class = "factor")),row.names = c("1",class = "data.frame")

解决方法

R将factor类用于分类数据。如果您将年龄(当前是因素)更改为numeric,则summary(dat)将为您提供大部分所需的信息。

convert_to_numeric = c("age_onset","age_last_visit","age_death")
dat[convert_to_numeric] = lapply(dat[convert_to_numeric],function(x) as.numeric(as.character(x)))
summary(dat)
 #         IID        cohort   sex   status Race   Ethnicity   age_onset  age_last_visit 
 # NACC000875:1   ADC8_AA:10   1:2   1:6    2:10   0:10      Min.   :63   Min.   :70.00  
 # NACC003779:1                2:8   2:4                     1st Qu.:66   1st Qu.:70.25  
 # NACC006805:1                                              Median :69   Median :75.50  
 # NACC008215:1                                              Mean   :70   Mean   :76.70  
 # NACC010067:1                                              3rd Qu.:73   3rd Qu.:81.00  
 # NACC010592:1                                              Max.   :79   Max.   :89.00  
 # (Other)   :4                                              NA's   :6                   
 #   age_death     apoe4any
 # Min.   :72.00   0:3     
 # 1st Qu.:80.00   1:7     
 # Median :88.00           
 # Mean   :83.33           
 # 3rd Qu.:89.00           
 # Max.   :90.00           
 # NA's   :7            

请参阅this common FAQ,了解我向数字转换的因素。

如果您只想汇总提到的列,则还可以对数据进行子集处理:

summary(dat[c("sex",convert_to_numeric,"apoe4any")])
 # sex     age_onset  age_last_visit    age_death     apoe4any
 # 1:2   Min.   :63   Min.   :70.00   Min.   :72.00   0:3     
 # 2:8   1st Qu.:66   1st Qu.:70.25   1st Qu.:80.00   1:7     
 #       Median :69   Median :75.50   Median :88.00           
 #       Mean   :70   Mean   :76.70   Mean   :83.33           
 #       3rd Qu.:73   3rd Qu.:81.00   3rd Qu.:89.00           
 #       Max.   :79   Max.   :89.00   Max.   :90.00           
 #       NA's   :6                    NA's   :7