问题描述
我正在尝试创建一个 data.frame,其中我添加了与年龄分布成比例的年龄分布中不同性别的份额。
我有以下两个数据表
date time age confirmed deceased
1: 2020-03-02 0 0s 32 0
2: 2020-03-02 0 10s 169 0
3: 2020-03-02 0 20s 1235 0
4: 2020-03-02 0 30s 506 1
5: 2020-03-02 0 40s 633 1
---
1085: 2020-06-30 0 40s 1681 3
1086: 2020-06-30 0 50s 2286 15
1087: 2020-06-30 0 60s 1668 41
1088: 2020-06-30 0 70s 850 82
1089: 2020-06-30 0 80s 556 139
date time sex confirmed deceased
1: 2020-03-02 0 male 1591 13
2: 2020-03-02 0 female 2621 9
3: 2020-03-03 0 male 1810 16
4: 2020-03-03 0 female 3002 12
5: 2020-03-04 0 male 1996 20
---
238: 2020-06-28 0 female 7265 131
239: 2020-06-29 0 male 5470 151
240: 2020-06-29 0 female 7287 131
241: 2020-06-30 0 male 5495 151
242: 2020-06-30 0 female 7305 131
是否可以推断出每个年龄组的性别比例?
一般来说,我想控制对电晕死亡的影响第三个控制变量(年龄分布)。有男性死亡率高于女性的趋势。我想调查各年龄段之间年龄组的频率分布,以找到更多解释。
感谢您的建议
解决方法
您需要找到一种方法来合并两个数据集,以便按性别-年龄组合计算份额。但似乎它们无法合并,因为除了 date
之外没有其他通用键,它不是唯一标识符。
age = read.csv("TimeAge.csv")
sex = read.csv("TimeGender.csv")
head(age)
date time age confirmed deceased
1 2020-03-02 0 0s 32 0
2 2020-03-02 0 10s 169 0
3 2020-03-02 0 20s 1235 0
4 2020-03-02 0 30s 506 1
5 2020-03-02 0 40s 633 1
6 2020-03-02 0 50s 834
head(sex)
date time sex confirmed deceased
1 2020-03-02 0 male 1591 13
2 2020-03-02 0 female 2621 9
3 2020-03-03 0 male 1810 16
4 2020-03-03 0 female 3002 12
5 2020-03-04 0 male 1996 20
6 2020-03-04 0 female 3332 12
aggregate(confirmed~age,sum,data = age)
age confirmed
1 0s 16107
2 10s 68752
3 20s 345827
4 30s 137539
5 40s 168250
6 50s 230030
7 60s 158505
8 70s 82107
9 80s 54086
aggregate(confirmed~sex,data = sex)
sex confirmed
1 female 747467
2 male 513727