group_by lag返回未滞后的变量

问题描述

试图创建一个滞后变量以计算两次观察之间的行驶里程,我感到非常困惑和困惑。

>   fuel_sheet %>% dplyr::select(vehicle,date,mileage)
+                                                       
# A tibble: 64 x 3                                      
# Groups:   vehicle [2]                                 
   vehicle         date       mileage                   
   <fct>           <date>       <dbl>                   
 1 VW T4 Campervan 2015-09-30  150631                   
 2 VW T4 Campervan 2015-10-16  150866                   
 3 VW T4 Campervan 2015-10-18  151395                   
 4 VW T4 Campervan 2015-12-06  151731                   
 5 VW T4 Campervan 2016-01-22  151968                   
 6 VW T4 Campervan 2016-01-24  152360                   
 7 VW T4 Campervan 2016-01-25  152731                   
 8 VW T4 Campervan 2016-02-21  153066                   
 9 VW T4 Campervan 2016-03-29  153500                   
10 VW T4 Campervan 2016-04-16  153751                   
# … with 54 more rows                                   
> fuel_sheet <- fuel_sheet %>%                                    
    group_by(vehicle) %>%                                       
    mutate(miles_lag = lag(mileage,n=1,order_by=date)) %>%                   
    mutate(miles = mileage - lag(mileage,order_by=date))  
> fuel_sheet %>% dplyr::select(vehicle,mileage,miles_lag,miles)
# A tibble: 64 x 5                                   
# Groups:   vehicle [2]                              
   vehicle         date       mileage miles_lag miles
   <fct>           <date>       <dbl>     <dbl> <dbl>
 1 VW T4 Campervan 2015-09-30  150631    150631     0
 2 VW T4 Campervan 2015-10-16  150866    150866     0
 3 VW T4 Campervan 2015-10-18  151395    151395     0
 4 VW T4 Campervan 2015-12-06  151731    151731     0
 5 VW T4 Campervan 2016-01-22  151968    151968     0
 6 VW T4 Campervan 2016-01-24  152360    152360     0
 7 VW T4 Campervan 2016-01-25  152731    152731     0
 8 VW T4 Campervan 2016-02-21  153066    153066     0
 9 VW T4 Campervan 2016-03-29  153500    153500     0
10 VW T4 Campervan 2016-04-16  153751    153751     0
# … with 54 more rows                                

如果我尝试使用diff()代替滞后来直接计算差值,则会得到一个一个元素的数组。

在SO上在这种事情上发现了很多线程,一个示例是r - diff operation within a group,after a dplyr::group_by() - Stack Overflow,但看不到为什么我的代码表现不正常。

plyr没有冲突,因为它没有加载,在这里我看不到任何其他会影响事物的冲突。

> sessionInfo()
R version 4.0.2 (2020-06-22)                                                                            
Platform: x86_64-pc-linux-gnu (64-bit)                                                                  
Running under: Gentoo/Linux                                                                             
                                                                                                        
Matrix products: default                                                                                
BLAS:   /usr/lib64/libblas.so.3.8.0                                                                     
LAPACK: /usr/lib64/liblapack.so.3.8.0                                                                   
                                                                                                        
locale:                                                                                                 
 [1] LC_CTYPE=en_GB.utf8       LC_NUMERIC=C              LC_TIME=en_GB.utf8        LC_COLLATE=en_GB.utf8
 [5] LC_MONETARY=en_GB.utf8    LC_MESSAGES=en_GB.utf8    LC_PAPER=en_GB.utf8       LC_NAME=C            
 [9] LC_ADDRESS=C              LC_TELEPHONE=C            LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C  
                                                                                                        
attached base packages:                                                                                 
[1] graphics  Grdevices utils     datasets  stats     methods   base                                    
                                                                                                         
other attached packages:                                                                                 
 [1] reprex_0.3.0        xtable_1.8-4        orgutils_0.4-1      lubridate_1.7.9     googlesheets4_0.2.0 
 [6] chron_2.3-56        ascii_2.3           spelling_2.1        Rsqlite_2.2.0       rmsfact_0.0.3       
[11] rmarkdown_2.3       reticulate_1.16     magrittr_1.5        kableExtra_1.2.1    knitr_1.29          
[16] Hmisc_4.4-1         Formula_1.2-3       survival_3.2-3      lattice_0.20-41     googlesheets_0.3.0  
[21] foreign_0.8-80      fcuk_0.1.21         devtools_2.3.1      usethis_1.6.1       forcats_0.5.0       
[26] stringr_1.4.0       dplyr_1.0.2         purrr_0.3.4         readr_1.3.1         tidyr_1.1.2         
[31] tibble_3.0.3        ggplot2_3.3.2       tidyverse_1.3.0                                             
                                                                                                         
loaded via a namespace (and not attached):                                                               
 [1] googledrive_1.0.1   colorspace_1.4-1    ellipsis_0.3.1      rprojroot_1.3-2     htmlTable_2.0.1     
 [6] base64enc_0.1-3     fs_1.5.0            rstudioapi_0.11     remotes_2.2.0       bit64_4.0.5         
[11] fansi_0.4.1         xml2_1.3.2          splines_4.0.2       pkgload_1.1.0       jsonlite_1.7.1      
[16] broom_0.7.0         cluster_2.1.0       dbplyr_1.4.4        png_0.1-7           clipr_0.7.0         
[21] compiler_4.0.2      httr_1.4.2          backports_1.1.9     assertthat_0.2.1    Matrix_1.2-18       
[26] gargle_0.5.0        cli_2.0.2           later_1.1.0.1       htmltools_0.5.0     prettyunits_1.1.1   
[31] tools_4.0.2         gtable_0.3.0        glue_1.4.2          Rcpp_1.0.5          cellranger_1.1.0    
[36] vctrs_0.3.4         xfun_0.17           ps_1.3.4            testthat_2.3.2      rvest_0.3.6         
[41] lifecycle_0.2.0     textutils_0.2-0     stringdist_0.9.6    scales_1.1.1        promises_1.1.1      
[46] hms_0.5.3           parallel_4.0.2      RColorBrewer_1.1-2  curl_4.3            memoise_1.1.0       
[51] gridExtra_2.3       rpart_4.1-15        latticeExtra_0.6-29 stringi_1.5.3       desc_1.2.0          
[56] checkmate_2.0.0     pkgbuild_1.1.0      rlang_0.4.7         pkgconfig_2.0.3     evaluate_0.14       
[61] htmlwidgets_1.5.1   bit_4.0.4           processx_3.4.4      tidyselect_1.1.0    R6_2.4.1            
[66] generics_0.0.2      DBI_1.1.0           whisker_0.4         pillar_1.4.6        haven_2.3.1         
[71] withr_2.2.0         nnet_7.3-14         modelr_0.1.8        Crayon_1.3.4        utf8_1.1.4          
[76] jpeg_0.1-8.1        grid_4.0.2          readxl_1.3.1        data.table_1.13.0   blob_1.2.1          
[81] callr_3.4.4         digest_0.6.25       webshot_0.5.2       httpuv_1.5.4        openssl_1.4.2       
[86] munsell_0.5.0       viridisLite_0.3.0   askpass_1.1         sessioninfo_1.1.1                       
> tidyverse_conflicts()                                                                                       
── Conflicts ───────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──                                                                                                            
✖ lubridate::as.difftime() masks base::as.difftime()                                                          
✖ lubridate::date()        masks base::date()                                                                 
✖ lubridate::days()        masks chron::days()                                                                
✖ ascii::expand()          masks tidyr::expand()                                                              
✖ magrittr::extract()      masks tidyr::extract()                                                             
✖ stats::filter()          masks dplyr::filter()                                                              
✖ kableExtra::group_rows() masks dplyr::group_rows()                                                          
✖ lubridate::hours()       masks chron::hours()                                                               
✖ lubridate::intersect()   masks base::intersect()                                                            
✖ stats::lag()             masks dplyr::lag()                                                                 
✖ lubridate::minutes()     masks chron::minutes()                                                             
✖ lubridate::seconds()     masks chron::seconds()                                                             
✖ magrittr::set_names()    masks purrr::set_names()                                                           
✖ lubridate::setdiff()     masks base::setdiff()                                                              
✖ Hmisc::src()             masks dplyr::src()                                                                 
✖ Hmisc::summarize()       masks dplyr::summarize()                                                           
✖ lubridate::union()       masks base::union()                                                                
✖ lubridate::years()       masks chron::years()                                                               

我们将不胜感激收到任何建议或指示,如果我需要提供更多信息,请告诉我。

谢谢。

编辑:根据数据的dput()请求,我知道有一种简单的方法可以将数据转储给其他人使用,但是我一生都无法记住它的含义,一直在寻找reprex程序包可以帮助您,但不会成功...

> dput(fuel_sheet %>% dplyr::select(vehicle,mileage))                                    
structure(list(vehicle = structure(c(3L,3L,1L,3L),.Label = c("Ford Focus","Honda Jazz","VW T4 Campervan"),class = "factor"),date = structure(c(16708,16724,16726,16775,16822,16824,16825,16852,16889,16907,16914,16915,16973,17009,17011,17015,17025,17047,17092,17117,17144,17146,17173,17237,17284,17295,17319,17333,17361,17423,17434,17453,17466,17558,17579,17634,17635,17641,17642,17710,17743,17807,17929,17995,17999,18004,18042,18074,18077,18106,18129,18139,18223,18227,18337,18469,18470,18493,18501,18505,18509                                   
),class = "Date"),mileage = c(150631,150866,151395,151731,151968,152360,152731,153066,153500,153751,154363,154619,154866,155473,156008,156199,156637,156997,157157,61476,61774,157458,62175,62409,63209,157769,158325,158803,158956,159314,159706,160126,160550,160986,161363,162000,162449,162879,163215,163636,164005,164613,164802,165352,165651,165946,166488,166987,167131,167749,168013,168361,168853,169371,169996,170636,170640,170954,171573,171763,172286,172616,173105,173266)),row.names = c(NA,-64L),groups = structure(list(                     
    vehicle = structure(c(1L,.rows = structure(list(                              
        c(20L,21L,23L,24L,25L),c(1L,2L,4L,5L,6L,7L,8L,9L,10L,11L,12L,13L,14L,15L,16L,17L,18L,19L,22L,26L,27L,28L,29L,30L,31L,32L,33L,34L,35L,36L,37L,38L,39L,40L,41L,42L,43L,44L,45L,46L,47L,48L,49L,50L,51L,52L,53L,54L,55L,56L,57L,58L,59L,60L,61L,62L,63L,64L)),ptype = integer(0),class = c("vctrs_list_of","vctrs_vctr","list"))),row.names = 1:2,class = c("tbl_df","tbl","data.frame"),.drop = TRUE),class = c("grouped_df","tbl_df","data.frame"))                                                                 

编辑2:省略order_by无济于事...

> fuel_sheet %>%  group_by(vehicle) %>% mutate(miles = mileage - lag(mileage,n=1))    
# A tibble: 64 x 10                                                                    
# Groups:   vehicle [2]                                                                
   vehicle         mileage litre   gbp station   date       month  year miles_lag miles
   <fct>             <dbl> <dbl> <dbl> <fct>     <date>     <dbl> <dbl>     <dbl> <dbl>
 1 VW T4 Campervan  150631  78.0  81.9 Tescos    2015-09-30     9  2015    150631     0
 2 VW T4 Campervan  150866  33.6  36.9 Nisa      2015-10-16    10  2015    150866     0
 3 VW T4 Campervan  151395  66    72.5 Tescos    2015-10-18    10  2015    151395     0
 4 VW T4 Campervan  151731  44    44.4 Morrisons 2015-12-06    12  2015    151731     0
 5 VW T4 Campervan  151968  34.1  34   Other     2016-01-22     1  2016    151968     0
 6 VW T4 Campervan  152360  49.1  48.1 Morrisons 2016-01-24     1  2016    152360     0
 7 VW T4 Campervan  152731  53.2  52.0 Tescos    2016-01-25     1  2016    152731     0
 8 VW T4 Campervan  153066  44.0  43   Tescos    2016-02-21     2  2016    153066     0
 9 VW T4 Campervan  153500  56.9  58   Tescos    2016-03-29     3  2016    153500     0
10 VW T4 Campervan  153751  27.6  29   Tescos    2016-04-16     4  2016    153751     0
# … with 54 more rows                                                                  

解决方法

在您遇到的冲突中

stats::lag()             masks dplyr::lag()

如果您在两个dplyr::lag()中都指定了mutate,则应该可以使用。 (我检查了指定stats::lag()的输出是否与您的输出相同。)

,

这可能是一个非常幼稚的解决方案,但它可行

df[,'mileage_lag']=c(0,diff(df$mileage))

修改

library(data.table)

dt=data.table(df)
dt[,'mileage_lag':=c(0,diff(mileage)),by=.(vehicles)]

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...