问题描述
为简单起见,我创建了一个小的虚拟数据集。
请注意:日期采用yyyy-mm-dd格式
这是数据集DF:
return render_template('index.html',operation = 'display_error',tables=[df3.to_html(classes='data',index=False)],titles = ['ID','Following cells are not matching'])
这是数据集DFc:
DF <- tibble(country = rep(c("France","England","Spain"),each = 4),date = rep(c("2020-01-01","2020-02-01","2020-03-01","2020-04-01"),times = 3),visits = c(10,16,14,12,11,9,13,15,10))
# A tibble: 12 x 3
country date visits
<chr> <chr> <dbl>
1 France 2020-01-01 10
2 France 2020-01-02 16
3 France 2020-01-03 14
4 France 2020-01-04 12
5 England 2020-01-01 11
6 England 2020-01-02 9
7 England 2020-01-03 12
8 England 2020-01-04 14
9 Spain 2020-01-01 13
10 Spain 2020-01-02 13
11 Spain 2020-01-03 15
12 Spain 2020-01-04 10
比方说,我只有数据集DFc。我可以使用哪些R函数来重新创建visits列(如数据集DF中所示),并且本质上是“撤消/撤消” cumsum()?
有人告诉我可以合并lag()函数,但是我不确定该怎么做。
而且,如果日期间隔几周而不是一天,代码将如何更改?
任何帮助将不胜感激:)
解决方法
从玩具示例开始:
library(dplyr)
DF <- tibble(country = rep(c("France","England","Spain"),each = 4),date = rep(c("2020-01-01","2020-02-01","2020-03-01","2020-04-01"),times = 3),visits = c(10,16,14,12,11,9,13,15,10))
DF <- DF %>%
group_by(country) %>%
mutate(cumulative_visits = cumsum(visits)) %>%
ungroup()
我为您提出两种方法:
- 差异
- 滞后[根据您的具体要求]
DF %>%
group_by(country) %>%
mutate(decum_visits1 = c(cumulative_visits[1],diff(cumulative_visits)),decum_visits2 = cumulative_visits - lag(cumulative_visits,default = 0)) %>%
ungroup()
#> # A tibble: 12 x 6
#> country date visits cumulative_visits decum_visits1 decum_visits2
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 France 2020-01-01 10 10 10 10
#> 2 France 2020-02-01 16 26 16 16
#> 3 France 2020-03-01 14 40 14 14
#> 4 France 2020-04-01 12 52 12 12
#> 5 England 2020-01-01 11 11 11 11
#> 6 England 2020-02-01 9 20 9 9
#> 7 England 2020-03-01 12 32 12 12
#> 8 England 2020-04-01 14 46 14 14
#> 9 Spain 2020-01-01 13 13 13 13
#> 10 Spain 2020-02-01 13 26 13 13
#> 11 Spain 2020-03-01 15 41 15 15
#> 12 Spain 2020-04-01 10 51 10 10
如果缺少一个日期,例如下面的示例:
DF1 <- DF %>%
# set to date!
mutate(date = as.Date(date)) %>%
# remove one date just for the sake of the example
filter(date != as.Date("2020-02-01"))
然后我建议您complete
日期,而fill
visits
的值为零,而cumulative_visits
的值为最后看到的值。然后,您可以像以前一样获得cumsum
的反义词。
DF1 %>%
group_by(country) %>%
# complete and fill with zero!
tidyr::complete(date = seq.Date(min(date),max(date),by = "month"),fill = list(visits = 0)) %>%
# fill cumulative with the last available value
tidyr::fill(cumulative_visits) %>%
# reset in the same way
mutate(decum_visits1 = c(cumulative_visits[1],default = 0)) %>%
ungroup()
,
这是一个通用的解决方案。这很草率,因为如您所见,它没有返回
$('.element.').property();
example
$('body').width();
,但是可以解决。 (可以反转最后一行的输出。)我将其保留为“作为读者的练习”。
foo[1]