问题描述
我有一个像这样的数据框:
name count
a 3
a 5
a 8
b 2
a 9
b 7
所以我想按名称计算行差异组。所以我的代码是:
data%>%group_by(Name)%>%mutate(last_count = lag(count),diff = count - last_count)
但是,我得到如下表所示的结果
name count last_count diff
a 3 NA NA
a 5 3 2
a 8 5 3
b 2 NA NA
a 9 8 1
b 7 2 5
但是我想要的应该是这样的:
name count last_count diff
a 3 NA NA
a 5 3 2
a 8 5 3
b 2 NA NA
a 9 NA NA
b 7 NA NA
在此先感谢任何可以帮助我解决问题的人!
解决方法
这项工作:
> library(dplyr)
> df %>% mutate(last_count = case_when(name == lag(name) ~ lag(count),TRUE ~ NA_real_),diff = case_when(name == lag(name) ~ count - lag(count),TRUE ~ NA_real_))
# A tibble: 6 x 4
name count last_count diff
<chr> <dbl> <dbl> <dbl>
1 a 3 NA NA
2 a 5 3 2
3 a 8 5 3
4 b 2 NA NA
5 a 9 NA NA
6 b 7 NA NA
>
,
我们可以使用rleid
根据“名称”列中相邻的匹配值创建一个分组列,然后应用diff
library(dplyr)
library(data.table)
data %>%
group_by(grp = rleid(name)) %>%
mutate(last_count = lag(count),diff = count - last_count) %>%
ungroup %>%
select(-grp)
-输出
# A tibble: 6 x 4
# name count last_count diff
# <chr> <int> <int> <int>
#1 a 3 NA NA
#2 a 5 3 2
#3 a 8 5 3
#4 b 2 NA NA
#5 a 9 NA NA
#6 b 7 NA NA
或者将base R
与ave
和rle
一起使用
data$diff <- with(data,ave(count,with(rle(name),rep(seq_along(values),lengths)),FUN = function(x) c(NA,diff(x)))
数据
data <- structure(list(name = c("a","a","b","b"),count = c(3L,5L,8L,2L,9L,7L)),class = "data.frame",row.names = c(NA,-6L))