根据R中的间隔[开始，停止]数据估算密度

问题描述

说明

这个问题的动机来自于临床/流行病学研究，其中的研究通常招募患者，然后随访患者不同的时间长度。

进入研究时的年龄分布通常是令人感兴趣的，并且易于评估，但是偶尔有兴趣在研究过程中的任何时间年龄分布。

我的问题是，是否有一种方法可以从间隔数据（例如[age_start，age_stop]）中估算出这样的密度，而不会按如下所示扩展数据？长格式方法似乎不太美观，更不用说它的内存使用了！

使用生存包中数据的可复制示例

#### Prep Data ###
library(survival)
library(ggplot2)
library(dplyr)

data(colon,package = 'survival')
# example using the colon dataset from the survival package
ccdeath <- colon %>%
  # use data on time to death (not recurrence)
  filter(etype == 2) %>%
  # age at end of follow-up (death or censoring)
  mutate(age_last = age + (time / 365.25))

#### Distribution Using Single Value ####
# age at study entry
ggplot(ccdeath,aes(x = age)) +
  geom_density() +
  labs(title = "Fig 1.",x = "Age at Entry (years)",y = "Density")

#### Using Person-Month Level Data ####
# create counting-process/person-time dataset
ccdeath_cp <- survSplit(Surv(age,age_last,status) ~ .,data = ccdeath,cut = seq(from = floor(min(ccdeath$age)),to = ceiling(max(ccdeath$age_last)),by = 1/12))

nrow(ccdeath_cp) # over 50,000 rows

# distribution of age at person-month level
ggplot(ccdeath_cp,aes(x = age)) +
  geom_density() +
  labs(title = "Figure 2: Density based on approximate person-months",x = "Age (years)",y = "Density")

#### Using Person-Day Level Data ####
# create counting-process/person-time dataset
ccdeath_cp <- survSplit(Surv(age,by = 1/365.25))

nrow(ccdeath_cp) # over 1.5 million rows!

# distribution of age at person-month level
ggplot(ccdeath_cp,aes(x = age)) +
  geom_density() +
  labs(title = "Figure 3: Density based on person-days",y = "Density")

注意：虽然我将这个问题标记为“生存”是因为我认为它会吸引熟悉该领域的人们，但我对这里的活动时间不感兴趣，而只是对所有学习时间的总体年龄分布感兴趣。 / p>

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

kernel-density r r survival