在Stata中使用stsplit和strate时如何获得人口规模而不是人年？

问题描述

在Stata中使用strategy如何得到组内人数而不是人年？

使用队列数据，我在 Stata 中创建了一个生存数据集，如下所示：

stset end,id(person) failure(event==1) scale(365.25) enter(time start) origin(time dob)
            
stsplit ageband,at(0 (1) 5) after(time=dob) 
stsplit year,after(time=mdy(1,1,1960)) at(40 (1) 45) 
replace year = 1960 + year
            
strate ageband year sex,per(100000) output("rates.dta",replace)

其中每个 person 在 dob 出生并在 start 日期进入学习并在 end 日期离开。如果 person 在此期间拥有 event (event == 1)，则他们会在活动日期离开。

stset 创建生存数据。 stsplit 将数据集拆分为年龄段（0-5 岁）和日历年（2000-2005）。

strate 通过 ageband year sex 的每个不同值计算费率，并将汇总数据存储在“rates.dta”中。这些汇总结果显示，对于 ageband year sex 的每个组合：_D 表示事件数，_Y 表示人年，这将是分子和分母，分别用于计算利率。

我想计算事件 _D 在每个组的总人数中所占的比例。

有没有办法让 _Y 成为该组内的总人数，例如年龄范围 = 0，年份 = 2000，性别 = 1 ?

我还能如何获得每组的人数？

解决方法

我的解决方案：

* ADD before stset to get total number of people in the dataset
egen tag = tag(person)
egen N_total = total(tag)
drop tag

stset end,id(person) failure(event==1) scale(365.25) enter(time start) origin(time dob)
            
stsplit ageband,at(0 (1) 5) after(time=dob) 
stsplit year,after(time=mdy(1,1,1960)) at(40 (1) 45) 
replace year = 1960 + year

* ADD for each variable / group that you want to find the number of people in
egen tag = tag(person ageband)
egen N_ageband = total(tag),by(ageband)
drop tag

egen tag = tag(person year)
egen N_year = total(tag),by(year)
drop tag

egen tag = tag(person sex)
egen N_sex = total(tag),by(sex)
drop tag

* KEEP variables of interest
keep person event ageband year sex N_total N_ageband N_year N_sex

collapse (mean) N_total N_ageband N_year N_sex,by(ageband sex year person) 
            
* SAVE
save "proportions.dta",replace


strate ageband year sex,per(100000) output("rates.dta",replace)

然后，对于每个组（total、ageband、year、sex），将您的 rates.dta 文件与 proportions.dta 文件合并：

foreach var in total ageband year sex {
    use "rates.dta",clear
    merge m:1 person sex ageband year using "proportions.dta",keep(match) nogen
    if `var'==total collapse (sum) _D N_`var'
    else collapse (sum) _D N_`var',by(`var')

    * do any other processing with the results

    save "proportions_by_`var'.dta",replace
}

frequency rate rate stata survival-analysis