问题描述
试图创建一个滞后变量以计算两次观察之间的行驶里程,我感到非常困惑和困惑。
> fuel_sheet %>% dplyr::select(vehicle,date,mileage)
+
# A tibble: 64 x 3
# Groups: vehicle [2]
vehicle date mileage
<fct> <date> <dbl>
1 VW T4 Campervan 2015-09-30 150631
2 VW T4 Campervan 2015-10-16 150866
3 VW T4 Campervan 2015-10-18 151395
4 VW T4 Campervan 2015-12-06 151731
5 VW T4 Campervan 2016-01-22 151968
6 VW T4 Campervan 2016-01-24 152360
7 VW T4 Campervan 2016-01-25 152731
8 VW T4 Campervan 2016-02-21 153066
9 VW T4 Campervan 2016-03-29 153500
10 VW T4 Campervan 2016-04-16 153751
# … with 54 more rows
> fuel_sheet <- fuel_sheet %>%
group_by(vehicle) %>%
mutate(miles_lag = lag(mileage,n=1,order_by=date)) %>%
mutate(miles = mileage - lag(mileage,order_by=date))
> fuel_sheet %>% dplyr::select(vehicle,mileage,miles_lag,miles)
# A tibble: 64 x 5
# Groups: vehicle [2]
vehicle date mileage miles_lag miles
<fct> <date> <dbl> <dbl> <dbl>
1 VW T4 Campervan 2015-09-30 150631 150631 0
2 VW T4 Campervan 2015-10-16 150866 150866 0
3 VW T4 Campervan 2015-10-18 151395 151395 0
4 VW T4 Campervan 2015-12-06 151731 151731 0
5 VW T4 Campervan 2016-01-22 151968 151968 0
6 VW T4 Campervan 2016-01-24 152360 152360 0
7 VW T4 Campervan 2016-01-25 152731 152731 0
8 VW T4 Campervan 2016-02-21 153066 153066 0
9 VW T4 Campervan 2016-03-29 153500 153500 0
10 VW T4 Campervan 2016-04-16 153751 153751 0
# … with 54 more rows
如果我尝试使用diff()
代替滞后来直接计算差值,则会得到一个短一个元素的数组。
在SO上在这种事情上发现了很多线程,一个示例是r - diff operation within a group,after a dplyr::group_by() - Stack Overflow,但看不到为什么我的代码表现不正常。
与plyr
没有冲突,因为它没有加载,在这里我看不到任何其他会影响事物的冲突。
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Gentoo/Linux
Matrix products: default
BLAS: /usr/lib64/libblas.so.3.8.0
LAPACK: /usr/lib64/liblapack.so.3.8.0
locale:
[1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
[5] LC_MONETARY=en_GB.utf8 LC_MESSAGES=en_GB.utf8 LC_PAPER=en_GB.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] graphics Grdevices utils datasets stats methods base
other attached packages:
[1] reprex_0.3.0 xtable_1.8-4 orgutils_0.4-1 lubridate_1.7.9 googlesheets4_0.2.0
[6] chron_2.3-56 ascii_2.3 spelling_2.1 Rsqlite_2.2.0 rmsfact_0.0.3
[11] rmarkdown_2.3 reticulate_1.16 magrittr_1.5 kableExtra_1.2.1 knitr_1.29
[16] Hmisc_4.4-1 Formula_1.2-3 survival_3.2-3 lattice_0.20-41 googlesheets_0.3.0
[21] foreign_0.8-80 fcuk_0.1.21 devtools_2.3.1 usethis_1.6.1 forcats_0.5.0
[26] stringr_1.4.0 dplyr_1.0.2 purrr_0.3.4 readr_1.3.1 tidyr_1.1.2
[31] tibble_3.0.3 ggplot2_3.3.2 tidyverse_1.3.0
loaded via a namespace (and not attached):
[1] googledrive_1.0.1 colorspace_1.4-1 ellipsis_0.3.1 rprojroot_1.3-2 htmlTable_2.0.1
[6] base64enc_0.1-3 fs_1.5.0 rstudioapi_0.11 remotes_2.2.0 bit64_4.0.5
[11] fansi_0.4.1 xml2_1.3.2 splines_4.0.2 pkgload_1.1.0 jsonlite_1.7.1
[16] broom_0.7.0 cluster_2.1.0 dbplyr_1.4.4 png_0.1-7 clipr_0.7.0
[21] compiler_4.0.2 httr_1.4.2 backports_1.1.9 assertthat_0.2.1 Matrix_1.2-18
[26] gargle_0.5.0 cli_2.0.2 later_1.1.0.1 htmltools_0.5.0 prettyunits_1.1.1
[31] tools_4.0.2 gtable_0.3.0 glue_1.4.2 Rcpp_1.0.5 cellranger_1.1.0
[36] vctrs_0.3.4 xfun_0.17 ps_1.3.4 testthat_2.3.2 rvest_0.3.6
[41] lifecycle_0.2.0 textutils_0.2-0 stringdist_0.9.6 scales_1.1.1 promises_1.1.1
[46] hms_0.5.3 parallel_4.0.2 RColorBrewer_1.1-2 curl_4.3 memoise_1.1.0
[51] gridExtra_2.3 rpart_4.1-15 latticeExtra_0.6-29 stringi_1.5.3 desc_1.2.0
[56] checkmate_2.0.0 pkgbuild_1.1.0 rlang_0.4.7 pkgconfig_2.0.3 evaluate_0.14
[61] htmlwidgets_1.5.1 bit_4.0.4 processx_3.4.4 tidyselect_1.1.0 R6_2.4.1
[66] generics_0.0.2 DBI_1.1.0 whisker_0.4 pillar_1.4.6 haven_2.3.1
[71] withr_2.2.0 nnet_7.3-14 modelr_0.1.8 Crayon_1.3.4 utf8_1.1.4
[76] jpeg_0.1-8.1 grid_4.0.2 readxl_1.3.1 data.table_1.13.0 blob_1.2.1
[81] callr_3.4.4 digest_0.6.25 webshot_0.5.2 httpuv_1.5.4 openssl_1.4.2
[86] munsell_0.5.0 viridisLite_0.3.0 askpass_1.1 sessioninfo_1.1.1
> tidyverse_conflicts()
── Conflicts ───────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ lubridate::as.difftime() masks base::as.difftime()
✖ lubridate::date() masks base::date()
✖ lubridate::days() masks chron::days()
✖ ascii::expand() masks tidyr::expand()
✖ magrittr::extract() masks tidyr::extract()
✖ stats::filter() masks dplyr::filter()
✖ kableExtra::group_rows() masks dplyr::group_rows()
✖ lubridate::hours() masks chron::hours()
✖ lubridate::intersect() masks base::intersect()
✖ stats::lag() masks dplyr::lag()
✖ lubridate::minutes() masks chron::minutes()
✖ lubridate::seconds() masks chron::seconds()
✖ magrittr::set_names() masks purrr::set_names()
✖ lubridate::setdiff() masks base::setdiff()
✖ Hmisc::src() masks dplyr::src()
✖ Hmisc::summarize() masks dplyr::summarize()
✖ lubridate::union() masks base::union()
✖ lubridate::years() masks chron::years()
我们将不胜感激收到任何建议或指示,如果我需要提供更多信息,请告诉我。
谢谢。
编辑:根据数据的dput()
请求,我知道有一种简单的方法可以将数据转储给其他人使用,但是我一生都无法记住它的含义,一直在寻找reprex
程序包可以帮助您,但不会成功...
> dput(fuel_sheet %>% dplyr::select(vehicle,mileage))
structure(list(vehicle = structure(c(3L,3L,1L,3L),.Label = c("Ford Focus","Honda Jazz","VW T4 Campervan"),class = "factor"),date = structure(c(16708,16724,16726,16775,16822,16824,16825,16852,16889,16907,16914,16915,16973,17009,17011,17015,17025,17047,17092,17117,17144,17146,17173,17237,17284,17295,17319,17333,17361,17423,17434,17453,17466,17558,17579,17634,17635,17641,17642,17710,17743,17807,17929,17995,17999,18004,18042,18074,18077,18106,18129,18139,18223,18227,18337,18469,18470,18493,18501,18505,18509
),class = "Date"),mileage = c(150631,150866,151395,151731,151968,152360,152731,153066,153500,153751,154363,154619,154866,155473,156008,156199,156637,156997,157157,61476,61774,157458,62175,62409,63209,157769,158325,158803,158956,159314,159706,160126,160550,160986,161363,162000,162449,162879,163215,163636,164005,164613,164802,165352,165651,165946,166488,166987,167131,167749,168013,168361,168853,169371,169996,170636,170640,170954,171573,171763,172286,172616,173105,173266)),row.names = c(NA,-64L),groups = structure(list(
vehicle = structure(c(1L,.rows = structure(list(
c(20L,21L,23L,24L,25L),c(1L,2L,4L,5L,6L,7L,8L,9L,10L,11L,12L,13L,14L,15L,16L,17L,18L,19L,22L,26L,27L,28L,29L,30L,31L,32L,33L,34L,35L,36L,37L,38L,39L,40L,41L,42L,43L,44L,45L,46L,47L,48L,49L,50L,51L,52L,53L,54L,55L,56L,57L,58L,59L,60L,61L,62L,63L,64L)),ptype = integer(0),class = c("vctrs_list_of","vctrs_vctr","list"))),row.names = 1:2,class = c("tbl_df","tbl","data.frame"),.drop = TRUE),class = c("grouped_df","tbl_df","data.frame"))
编辑2:省略order_by
无济于事...
> fuel_sheet %>% group_by(vehicle) %>% mutate(miles = mileage - lag(mileage,n=1))
# A tibble: 64 x 10
# Groups: vehicle [2]
vehicle mileage litre gbp station date month year miles_lag miles
<fct> <dbl> <dbl> <dbl> <fct> <date> <dbl> <dbl> <dbl> <dbl>
1 VW T4 Campervan 150631 78.0 81.9 Tescos 2015-09-30 9 2015 150631 0
2 VW T4 Campervan 150866 33.6 36.9 Nisa 2015-10-16 10 2015 150866 0
3 VW T4 Campervan 151395 66 72.5 Tescos 2015-10-18 10 2015 151395 0
4 VW T4 Campervan 151731 44 44.4 Morrisons 2015-12-06 12 2015 151731 0
5 VW T4 Campervan 151968 34.1 34 Other 2016-01-22 1 2016 151968 0
6 VW T4 Campervan 152360 49.1 48.1 Morrisons 2016-01-24 1 2016 152360 0
7 VW T4 Campervan 152731 53.2 52.0 Tescos 2016-01-25 1 2016 152731 0
8 VW T4 Campervan 153066 44.0 43 Tescos 2016-02-21 2 2016 153066 0
9 VW T4 Campervan 153500 56.9 58 Tescos 2016-03-29 3 2016 153500 0
10 VW T4 Campervan 153751 27.6 29 Tescos 2016-04-16 4 2016 153751 0
# … with 54 more rows
解决方法
在您遇到的冲突中
stats::lag() masks dplyr::lag()
如果您在两个dplyr::lag()
中都指定了mutate
,则应该可以使用。 (我检查了指定stats::lag()
的输出是否与您的输出相同。)
这可能是一个非常幼稚的解决方案,但它可行
df[,'mileage_lag']=c(0,diff(df$mileage))
修改
library(data.table)
dt=data.table(df)
dt[,'mileage_lag':=c(0,diff(mileage)),by=.(vehicles)]