是否有一个R函数可以撤消cumsum并在数据集中重新创建原始的非累积列？

问题描述

为简单起见，我创建了一个小的虚拟数据集。

请注意：日期采用yyyy-mm-dd格式

这是数据集DF：

return render_template('index.html',operation = 'display_error',tables=[df3.to_html(classes='data',index=False)],titles = ['ID','Following cells are not matching'])

这是数据集DFc：

DF <- tibble(country = rep(c("France","England","Spain"),each = 4),date = rep(c("2020-01-01","2020-02-01","2020-03-01","2020-04-01"),times = 3),visits = c(10,16,14,12,11,9,13,15,10))

# A tibble: 12 x 3
   country date       visits
   <chr>   <chr>       <dbl>
 1 France  2020-01-01     10
 2 France  2020-01-02     16
 3 France  2020-01-03     14
 4 France  2020-01-04     12
 5 England 2020-01-01     11
 6 England 2020-01-02      9
 7 England 2020-01-03     12
 8 England 2020-01-04     14
 9 Spain   2020-01-01     13
10 Spain   2020-01-02     13
11 Spain   2020-01-03     15
12 Spain   2020-01-04     10

比方说，我只有数据集DFc。我可以使用哪些R函数来重新创建visits列（如数据集DF中所示），并且本质上是“撤消/撤消” cumsum（）？

有人告诉我可以合并lag（）函数，但是我不确定该怎么做。

而且，如果日期间隔几周而不是一天，代码将如何更改？

任何帮助将不胜感激：）

解决方法

从玩具示例开始：

library(dplyr)

DF <- tibble(country = rep(c("France","England","Spain"),each = 4),date = rep(c("2020-01-01","2020-02-01","2020-03-01","2020-04-01"),times = 3),visits = c(10,16,14,12,11,9,13,15,10))


DF <- DF %>% 
  group_by(country) %>% 
  mutate(cumulative_visits = cumsum(visits)) %>% 
  ungroup()

我为您提出两种方法：

差异
滞后[根据您的具体要求]

DF %>%
  group_by(country) %>%
  mutate(decum_visits1 = c(cumulative_visits[1],diff(cumulative_visits)),decum_visits2 = cumulative_visits - lag(cumulative_visits,default = 0)) %>% 
  ungroup()

#> # A tibble: 12 x 6
#>    country date       visits cumulative_visits decum_visits1 decum_visits2
#>    <chr>   <chr>       <dbl>             <dbl>         <dbl>         <dbl>
#>  1 France  2020-01-01     10                10            10            10
#>  2 France  2020-02-01     16                26            16            16
#>  3 France  2020-03-01     14                40            14            14
#>  4 France  2020-04-01     12                52            12            12
#>  5 England 2020-01-01     11                11            11            11
#>  6 England 2020-02-01      9                20             9             9
#>  7 England 2020-03-01     12                32            12            12
#>  8 England 2020-04-01     14                46            14            14
#>  9 Spain   2020-01-01     13                13            13            13
#> 10 Spain   2020-02-01     13                26            13            13
#> 11 Spain   2020-03-01     15                41            15            15
#> 12 Spain   2020-04-01     10                51            10            10

如果缺少一个日期，例如下面的示例：

DF1 <- DF %>% 
  
  # set to date!
  mutate(date = as.Date(date)) %>%
  
  # remove one date just for the sake of the example
  filter(date != as.Date("2020-02-01"))

然后我建议您complete日期，而fill visits的值为零，而cumulative_visits的值为最后看到的值。然后，您可以像以前一样获得cumsum的反义词。

DF1 %>% 
  group_by(country) %>% 
  
  # complete and fill with zero!
  tidyr::complete(date = seq.Date(min(date),max(date),by = "month"),fill = list(visits = 0)) %>% 
  
  # fill cumulative with the last available value
  tidyr::fill(cumulative_visits) %>%
  
  # reset in the same way
  mutate(decum_visits1 = c(cumulative_visits[1],default = 0)) %>% 
  ungroup()

这是一个通用的解决方案。这很草率，因为如您所见，它没有返回$('.element.').property(); example $('body').width();，但是可以解决。（可以反转最后一行的输出。）我将其保留为“作为读者的练习”。

foo[1]

cumsum cumulative-sum date date date r r