问题描述
我正在使用 pivot_longer
将我的数据从宽格式改成长格式为多个值列。我知道有相关问题(Pivot_longer 6 columns to 3 columns 或 Tidy dataset with pivot_longer: Multiple columns into two columns),但到目前为止我找不到解决方案,可能是因为我的两列属于不同的类别,第一个是 POSIXct
第二个是numeric
。
这是一个最小的工作示例:
structure(list(compid = c("AT9130162999","AT9090003478","AT9070005375","AT9130048156"),iso2c = c("AT","AT","AT"),nace4 = c("7010","4211","2452","7010"),lastyear = c("2018","2019","2019"),`Closing date
Last avail. yr` = structure(c(1546214400,1577750400,1585612800,1577750400),tzone = "UTC",class = c("POSIXct","POSIXt")),`Closing date
Year - 1` = structure(c(1514678400,1546214400,1553990400,1546214400),`Closing date
Year - 2` = structure(c(NA,1514678400,1522454400,1514678400),`Closing date
Year - 3` = structure(c(NA,1483142400,1490918400,1483142400),`Closing date
Year - 4` = structure(c(NA,1451520000,1459382400,1451520000),`Closing date
Year - 5` = structure(c(NA,1419984000,1427760000,1419984000),`Closing date
Year - 6` = structure(c(NA,1388448000,1396224000,1388448000),`Closing date
Year - 7` = structure(c(NA,1356912000,1364688000,1356912000),`Closing date
Year - 8` = structure(c(NA,1325289600,1333152000,1325289600),`Closing date
Year - 9` = structure(c(NA,1293753600,1301529600,1293753600),operatinginc_last = c(NA,482813,-94300,NA),operatinginc_year1 = c(NA,423482,780400,operatinginc_year2 = c(NA,404694,1210300,ebit_last = c(1060000,351292),ebit_year1 = c(1501000,331415),ebit_year2 = c(NA,305492),operatingrev_last = c(28463000,15842418,13009700,11742884),operatingrev_year1 = c(NA,13734462,13146300,10682889
),operatingrev_year2 = c(NA,10682889)),row.names = c(NA,-4L),class = c("tbl_df","tbl","data.frame"))
到目前为止,我已经尝试过:
df_l <- df %>%
pivot_longer(.,cols = -(starts_with(c("compid","iso2c","nace4","lastyear","Closing"))),values_to = "value",values_drop_na=T,names_sep = "_",names_to = c("variable","year"))
但现在我还想重塑所有以 Closing
开头的列。我该怎么做(最好用 pivot_longer
一步完成)?
预期的输出应该包括 variable
、year
和 value
列,以及 closingdate
和 date
列:
compid iso2c nace4 lastyear `closingdate ~ `date ~`variable ~`year ~ `value
<chr> <chr> <chr> <chr> <dttm> <dttm> <dttm> <dttm>
1 AT913~ AT 7010 2018 `Closing date Last avail. yr` 2018-12-31 ebit last 28463000
2 AT913~ AT 7010 2018 `Closing date Year - 1` 2017-12-31 ebit year1 15362687
2 AT913~ AT 7010 2018 `Closing date Year - 1` 2016-12-31 ebit year2 404694
解决方法
我不知道在一次调用 pivot_longer 中您将如何做到这一点,因为您有不同方案的不同变量。而且您还希望将结束日期变量延长。所以这里是在两次调用中对结束变量进行了一些清理。
library(tidyverse)
df_l <- pivot_longer(df,cols = starts_with("Closing"),values_to = "date",values_drop_na=T,names_to = c("closing")) %>%
pivot_longer(.,cols = contains("_"),values_to = "value",names_sep = '_',names_to = c("variable",'year')) %>%
mutate(closing = str_remove_all(closing,'Closing date') %>%
str_remove_all(.,'[:cntrl:]') %>%
str_squish() %>%
str_trim())