问题描述
一个数据框和一些变量:
library(tidyverse)
library(lubridate)
budget_2020_q4 <- 1000000
budget_2021_q1 <- 2000000
budget_2021_q2 <- 3000000
budget_2021_q3 <- 3000000
budget_2021_q4 <- 2000000
calendar <- data.frame(
cohort = seq('2020-10-01' %>% ymd,'2021-12-31' %>% ymd,by = '1 days')) %>%
mutate(Quarter = quarter(cohort,with_year = T))
我现在有一个数据框,显示日期和这些日期所在的季度:
calendar %>% head
cohort Quarter
1 2020-10-01 2020.4
2 2020-10-02 2020.4
3 2020-10-03 2020.4
4 2020-10-04 2020.4
5 2020-10-05 2020.4
6 2020-10-06 2020.4
我也知道每个季度的频率:
calendar$Quarter %>% table
.
2020.4 2021.1 2021.2 2021.3 2021.4
92 90 91 92 92
我想更改一个新列“ daily_budget”,该列将获取该季度的预算并将其除以该季度的日期频率。
例如,2020年第四季度的预算为1000000,而第四季度有92天,因此每日预算为1000000/92 = 10869.57
在 mutate(Quarter = quarter(cohort,with_year = T))
之后,我可以通过某种方式将此计算结果集成到我的dplyr操作管道中吗?
解决方法
首先,让我们将预算放在数据框中:
budgets <- c(budget_2020_q4 = 1000000,budget_2021_q1 = 2000000,budget_2021_q2 = 3000000,budget_2021_q3 = 3000000,budget_2021_q4 = 2000000) %>%
enframe(name = "Quarter",value = "budget") %>%
mutate(Quarter = as.numeric(str_replace(str_remove(Quarter,"budget_"),"_q",".")))
然后,只需count
(每个预算table
的行数)(加入预算并除以两)就可以了(Quarter
为tidyverse的替代项)。
calendar %>%
add_count(Quarter) %>%
left_join(budgets,by = "Quarter") %>%
mutate(budget_by_day = budget / n)
哪个给
cohort Quarter n budget budget_by_day
1 2020-10-01 2020.4 92 1e+06 10869.57
2 2020-10-02 2020.4 92 1e+06 10869.57
3 2020-10-03 2020.4 92 1e+06 10869.57
4 2020-10-04 2020.4 92 1e+06 10869.57
5 2020-10-05 2020.4 92 1e+06 10869.57
6 2020-10-06 2020.4 92 1e+06 10869.57
7 2020-10-07 2020.4 92 1e+06 10869.57
8 2020-10-08 2020.4 92 1e+06 10869.57
9 2020-10-09 2020.4 92 1e+06 10869.57
10 2020-10-10 2020.4 92 1e+06 10869.57
...