将性别分布与 R 中的年龄分布联系起来

问题描述

我正在尝试创建一个 data.frame,其中我添加了与年龄分布成比例的年龄分布中不同性别的份额。

我有以下两个数据表


 date time age confirmed deceased
   1: 2020-03-02    0  0s        32        0
   2: 2020-03-02    0 10s       169        0
   3: 2020-03-02    0 20s      1235        0
   4: 2020-03-02    0 30s       506        1
   5: 2020-03-02    0 40s       633        1
  ---                                       
1085: 2020-06-30    0 40s      1681        3
1086: 2020-06-30    0 50s      2286       15
1087: 2020-06-30    0 60s      1668       41
1088: 2020-06-30    0 70s       850       82
1089: 2020-06-30    0 80s       556      139


date time    sex confirmed deceased
  1: 2020-03-02    0   male      1591       13
  2: 2020-03-02    0 female      2621        9
  3: 2020-03-03    0   male      1810       16
  4: 2020-03-03    0 female      3002       12
  5: 2020-03-04    0   male      1996       20
 ---                                          
238: 2020-06-28    0 female      7265      131
239: 2020-06-29    0   male      5470      151
240: 2020-06-29    0 female      7287      131
241: 2020-06-30    0   male      5495      151
242: 2020-06-30    0 female      7305      131

是否可以推断出每个年龄组的性别比例?

一般来说,我想控制对电晕死亡的影响第三个控制变量(年龄分布)。有男性死亡率高于女性的趋势。我想调查各年龄段之间年龄组的频率分布,以找到更多解释。

感谢您的建议

解决方法

您需要找到一种方法来合并两个数据集,以便按性别-年龄组合计算份额。但似乎它们无法合并,因为除了 date 之外没有其他通用键,它不是唯一标识符。

age = read.csv("TimeAge.csv")
sex = read.csv("TimeGender.csv")

head(age)
       date time age confirmed deceased
1 2020-03-02    0  0s        32        0
2 2020-03-02    0 10s       169        0
3 2020-03-02    0 20s      1235        0
4 2020-03-02    0 30s       506        1
5 2020-03-02    0 40s       633        1
6 2020-03-02    0 50s       834        

head(sex)
        date time    sex confirmed deceased
1 2020-03-02    0   male      1591       13
2 2020-03-02    0 female      2621        9
3 2020-03-03    0   male      1810       16
4 2020-03-03    0 female      3002       12
5 2020-03-04    0   male      1996       20
6 2020-03-04    0 female      3332       12

aggregate(confirmed~age,sum,data = age)
  age confirmed
1  0s     16107
2 10s     68752
3 20s    345827
4 30s    137539
5 40s    168250
6 50s    230030
7 60s    158505
8 70s     82107
9 80s     54086

aggregate(confirmed~sex,data = sex)
     sex confirmed
1 female    747467
2   male    513727