问题描述
考虑此数据框:
data <- data.frame(group = rep(letters[1:3],c(4,5,4)),Date = as.Date(c("2010-08-09","2010-09-11","2010-09-12","2010-09-18","2014-03-15","2014-03-16","2014-03-20","2014-03-21","2014-03-25","2016-05-02","2016-08-02","2016-08-03","2016-09-21")))
@H_502_5@我们分为三组,分别在不同的日期进行观察。我想找到每个组的第一个和最后一个日期(最好使用
dplyr
)。日期怎么办?编辑: 我添加此内容是为了阐明我提出询问的原因,这与R解释实时(日期)的能力有关。
data2
与上面的data
完全相同,但是请注意,我切换了前两个日期,因此对group==a
的观测不再按真实的时间顺序(从最早到最新,相对于实时):data2 <- data.frame(group = rep(letters[1:3],Date = as.Date(c("2010-09-11","2010-08-09","2016-09-21")))
@H_502_5@因此,2010年9月11日晚于2010年8月9日(实时),但是它们不在数据帧中按此时间顺序排列。 现在,如果这样做:
library(dplyr) data2%>%group_by(group) %>% summarise(FirsDate=first(Date),LastDate=last(Date))
@H_502_5@我们得到:
group FirsDate LastDate <fct> <date> <date> 1 a 2010-09-11 2010-09-18 2 b 2014-03-15 2014-03-25 3 c 2016-05-02 2016-09-21
@H_502_5@解决方法
我建议使用
first()
包中的last()
和dplyr
函数的方法:library(dplyr) #Data data <- data.frame(group = rep(letters[1:3],c(4,5,4)),Date = as.Date(c("2010-08-09","2010-09-11","2010-09-12","2010-09-18","2014-03-15","2014-03-16","2014-03-20","2014-03-21","2014-03-25","2016-05-02","2016-08-02","2016-08-03","2016-09-21"))) #Code data %>% group_by(group) %>% mutate(FirsDate=first(Date),LastDate=last(Date))
输出:
# A tibble: 13 x 4 # Groups: group [3] group Date FirsDate LastDate <fct> <date> <date> <date> 1 a 2010-08-09 2010-08-09 2010-09-18 2 a 2010-09-11 2010-08-09 2010-09-18 3 a 2010-09-12 2010-08-09 2010-09-18 4 a 2010-09-18 2010-08-09 2010-09-18 5 b 2014-03-15 2014-03-15 2014-03-25 6 b 2014-03-16 2014-03-15 2014-03-25 7 b 2014-03-20 2014-03-15 2014-03-25 8 b 2014-03-21 2014-03-15 2014-03-25 9 b 2014-03-25 2014-03-15 2014-03-25 10 c 2016-05-02 2016-05-02 2016-09-21 11 c 2016-08-02 2016-05-02 2016-09-21 12 c 2016-08-03 2016-05-02 2016-09-21 13 c 2016-09-21 2016-05-02 2016-09-21
如果只希望按组划分变量,则可以使用
summarise()
:#Code2 data %>% group_by(group) %>% summarise(FirsDate=first(Date),LastDate=last(Date))
输出:
# A tibble: 3 x 3 group FirsDate LastDate <fct> <date> <date> 1 a 2010-08-09 2010-09-18 2 b 2014-03-15 2014-03-25 3 c 2016-05-02 2016-09-21
更新:
#Code data2 %>% group_by(group) %>% summarise(FirsDate=min(Date),LastDate=max(Date))
输出:
,# A tibble: 3 x 3 group FirsDate LastDate <fct> <date> <date> 1 a 2010-08-09 2010-09-18 2 b 2014-03-15 2014-03-25 3 c 2016-05-02 2016-09-21
另一种尝试的方式
library(dplyr) data2 <- data %>% group_by(group) %>% filter(row_number()==1 | row_number()==n()) %>% ungroup() # group Date # <chr> <date> # 1 a 2010-08-09 # 2 a 2010-09-18 # 3 b 2014-03-15 # 4 b 2014-03-25 # 5 c 2016-05-02 # 6 c 2016-09-21