问题描述
我有一个长格式的数据框,我想对其进行重组。但是,当我这样做时,出现了严重的错误,我似乎无法想出办法。非常感谢任何帮助!
这是数据(对于 2 个 ID 变量,我还有 300 个)
# A tibble: 86 x 20
# Groups: ID,Day [12]
ID Day Obs Time1 Time1_1 Time_between Time_minutes PA1 NA1 PA2 NA2 PA3
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 27154 NA NA NA 4 1 6 1 6
2 1 1 2 30150 27154 2996 49 4 1 5 1 6
3 1 1 3 33266 30150 3116 51 5 1 5 1 6
4 1 1 4 39842 33266 6576 109 5 1 7 1 6
5 1 1 5 46744 39842 6902 115 5 1 6 1 6
6 1 1 6 50643 46744 3899 64 1 1 5 1 4
7 1 1 7 56343 50643 5700 95 3 1 6 1 6
8 1 1 8 61744 56343 5401 90 6 1 6 1 6
9 1 1 9 67205 61744 5461 91 5 1 6 1 6
10 1 1 10 75360 67205 8155 135 4 1 6 1 6
11 1 1 11 78062 75360 2702 45 6 1 6 1 6
12 1 2 12 42844 NA NA NA 1 1 6 1 6
13 1 2 13 47400 42844 4556 75 6 1 6 1 6
14 1 2 14 53522 47400 6122 102 6 1 6 1 6
15 1 2 15 58923 53522 5401 90 5 1 6 1 6
16 1 2 16 63245 58923 4322 72 6 1 6 1 6
17 1 2 17 67562 63245 4317 71 6 1 6 1 6
18 1 2 18 72960 67562 5398 89 5 1 5 1 5
19 1 3 19 43800 NA NA NA 4 1 5 1 7
20 1 3 20 49083 43800 5283 88 4 1 6 1 6
21 1 3 21 54302 49083 5219 86 5 1 6 1 5
22 1 3 22 58324 54302 4022 67 6 1 6 1 6
23 1 3 23 63123 58324 4799 79 5 1 5 1 6
24 1 3 24 70981 63123 7858 130 4 1 6 1 6
25 1 3 25 75603 70981 4622 77 4 1 6 1 5
26 1 3 26 77583 75603 1980 33 5 1 5 1 5
27 1 4 27 27420 NA NA NA 4 1 6 1 5
28 1 4 28 29288 27420 1868 31 4 1 5 1 5
29 1 4 29 35339 29288 6051 100 5 1 4 1 5
30 1 4 30 37744 35339 2405 40 4 1 3 1 5
31 1 4 31 43021 37744 5277 87 4 1 4 1 5
32 1 4 32 51781 43021 8760 146 4 1 4 1 5
33 1 4 33 71460 51781 19679 327 4 1 6 1 6
34 1 4 34 76204 71460 4744 79 4 1 5 1 5
35 1 5 35 33136 NA NA NA 1 1 6 1 5
36 1 5 36 38883 33136 5747 95 4 1 4 1 5
37 1 5 37 45603 38883 6720 112 4 1 5 1 5
38 1 5 38 49445 45603 3842 64 4 1 5 1 5
39 1 5 39 55624 49445 6179 102 5 1 5 1 5
40 1 5 40 67085 55624 11461 191 4 1 5 1 6
41 1 5 41 75724 67085 8639 143 5 1 5 1 5
42 1 6 42 27597 NA NA NA 4 1 5 1 5
43 1 6 43 29711 27597 2114 35 4 1 5 1 5
44 1 6 44 35311 29711 5600 93 4 1 5 1 5
45 1 6 45 45720 35311 10409 173 4 1 5 1 5
46 1 6 46 47880 45720 2160 36 4 1 5 1 5
47 1 6 47 54304 47880 6424 107 4 1 5 1 5
48 1 6 48 62042 54304 7738 128 4 1 5 1 5
49 1 6 49 66725 62042 4683 78 5 1 5 1 5
50 1 6 50 75302 66725 8577 142 4 1 5 1 5
51 2 1 1 31220 NA NA NA 5 1 6 1 7
52 2 1 2 37021 31220 5801 96 4 1 6 1 6
53 2 1 3 38820 37021 1799 29 4 3 5 2 6
54 2 1 4 47041 38820 8221 137 5 3 6 1 6
55 2 1 5 49202 47041 2161 36 4 4 4 2 6
56 2 2 6 27111 NA NA NA 3 1 4 3 5
57 2 2 7 40561 27111 13450 224 2 1 5 1 6
58 2 2 8 45483 40561 4922 82 5 1 5 1 4
59 2 2 9 65582 45483 20099 334 6 1 7 1 7
60 2 2 10 71460 65582 5878 97 6 1 6 1 6
61 2 2 11 77340 71460 5880 98 5 1 6 1 7
62 2 3 12 34566 NA NA NA 4 1 6 1 7
63 2 3 13 41405 34566 6839 113 7 1 5 1 5
64 2 3 14 44223 41405 2818 46 6 1 6 1 6
65 2 3 15 69485 44223 25262 421 5 1 4 1 6
66 2 4 16 37921 NA NA NA 5 1 5 1 6
67 2 4 17 54062 37921 16141 269 5 2 4 4 4
68 2 4 18 60542 54062 6480 108 5 3 5 1 5
69 2 4 19 66360 60542 5818 96 5 1 4 1 5
70 2 4 20 69663 66360 3303 55 4 1 4 1 7
71 2 4 21 76023 69663 6360 106 5 1 5 1 7
72 2 4 22 77463 76023 1440 24 4 1 5 1 5
73 2 5 23 27050 NA NA NA 5 3 5 1 6
74 2 5 24 29400 27050 2350 39 4 1 5 1 5
75 2 5 25 36783 29400 7383 123 5 1 5 1 5
76 2 5 26 42062 36783 5279 87 5 1 4 1 6
77 2 5 27 46984 42062 4922 82 5 1 6 1 5
78 2 5 28 50344 46984 3360 56 4 1 5 1 6
79 2 5 29 56885 50344 6541 109 7 1 7 1 7
80 2 5 30 71101 56885 14216 236 4 1 5 1 7
81 2 6 31 27094 NA NA NA 1 1 4 1 5
82 2 6 32 27559 27094 465 7 1 1 4 1 5
83 2 6 33 40441 27559 12882 214 4 1 5 1 6
84 2 6 34 44763 40441 4322 72 5 1 5 1 6
85 2 6 35 50522 44763 5759 95 5 1 5 1 5
86 2 6 36 60962 50522 10440 174 4 1 5 1 6
# ... with 8 more variables: NA3 <dbl>,PA4 <dbl>,NA4 <dbl>,PA5 <dbl>,NA5 <dbl>,# PA6 <dbl>,NA6 <dbl>,obs <int>
然后我用下面的代码进行重构
datasetSPSSSMESM_wide2 <- reshape(datasetSPSSSMESM_2,timevar="Obs",idvar="ID",direction="wide")
我想要这样的东西
ID Time1_1 Time1_2 Time 1_3 Time1_4 Time1_5 ....
1 27154 30150 33266 39842 46744
2 31220 37021 38820 47041 49202
但是当我查看数据集时,我得到了这个。变量本身都是不适用的,但不知何故,数据似乎都存储在变量名中。
# A tibble: 2 x 19
# Groups: ID [2]
ID `Day.c(1,2,3,~ `Time1.c(1,~ `Time1_1.c(1,2~ `Time_between.c~
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 NA NA NA NA
2 2 NA NA NA NA
# ... with 14 more variables: `Time_minutes.c(1,4,5,6,7,8,9,10,11,12,13,# 14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,# 35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50)` <dbl>,`PA1.c(1,# 3,14,# 26,35,# 47,`NA1.c(1,# 17,# 38,`PA2.c(1,# 7,# 29,# 50)` <dbl>,`NA2.c(1,# 20,# 41,`PA3.c(1,# 11,# 32,# `NA3.c(1,# 23,# 44,`PA4.c(1,`NA4.c(1,`PA5.c(1,`NA5.c(1,`PA6.c(1,`NA6.c(1,# `obs.c(1,50)` <int>
>
print(datasetSPSSSMESM_wide2)
有人知道我做错了什么吗?
非常感谢!
多米尼克
###########################UPDATE################ #########
现在有了 dput
> dput(head(datasetSPSSSMESM_2))
structure(list(ID = c(1,1,1),Day = c(1,Obs = c(1,6),Time1 = c(27154,30150,33266,39842,46744,50643),Time1_1 = c(NA,27154,46744),Time_between = c(NA,2996,3116,6576,6902,3899),Time_minutes = c(NA,51,109,115,64),PA1 = c(4,NA1 = c(1,PA2 = c(6,5),NA2 = c(1,PA3 = c(6,4),NA3 = c(2,PA4 = c(6,2),NA4 = c(2,PA5 = c(6,3),NA5 = c(1,PA6 = c(6,NA6 = c(1,obs = 1:6),row.names = c(NA,-6L),groups = structure(list(ID = 1,Day = 1,.rows = structure(list(
1:6),ptype = integer(0),class = c("vctrs_list_of","vctrs_vctr","list"))),row.names = 1L,class = c("tbl_df","tbl","data.frame"
),.drop = TRUE),class = c("grouped_df","tbl_df","data.frame"
))
整形后
> dput(head(datasetSPSSSMESM_wide2))
structure(list(ID = c(1,`Day.1:11` = c(NA_real_,NA_real_
),`Obs.1:11` = c(NA_real_,NA_real_),`Time1.1:11` = c(NA_real_,`Time1_1.1:11` = c(NA_real_,`Time_between.1:11` = c(NA_real_,`Time_minutes.1:11` = c(NA_real_,`PA1.1:11` = c(NA_real_,`NA1.1:11` = c(NA_real_,`PA2.1:11` = c(NA_real_,`NA2.1:11` = c(NA_real_,`PA3.1:11` = c(NA_real_,`NA3.1:11` = c(NA_real_,`PA4.1:11` = c(NA_real_,`NA4.1:11` = c(NA_real_,`PA5.1:11` = c(NA_real_,`NA5.1:11` = c(NA_real_,`PA6.1:11` = c(NA_real_,`NA6.1:11` = c(NA_real_,NA_real_)),-2L),groups = structure(list(ID = c(1,.rows = structure(list(
1L,2L),row.names = 1:2,reshapeWide = list(v.names = NULL,timevar = "obs",idvar = "ID",times = structure(list(obs = 1:11),-11L),"data.frame")),varying = structure(c("Day.1:11","Obs.1:11","Time1.1:11","Time1_1.1:11","Time_between.1:11","Time_minutes.1:11","PA1.1:11","NA1.1:11","PA2.1:11","NA2.1:11","PA3.1:11","NA3.1:11","PA4.1:11","NA4.1:11","PA5.1:11","NA5.1:11","PA6.1:11","NA6.1:11"),.Dim = c(18L,1L),.Dimnames = list(NULL,"obs"))),"data.frame"))
解决方法
我想如果您想让该数据框更宽,通常您只想选择一个变量来包含。例如,如果您想分析“Time1”,您可以尝试以下操作:
datasetSPSSSMESM_wide3 <- tidyr::pivot_wider(data = datasetSPSSSMESM_2,id_cols = "ID",names_from = "Obs",values_from = "Time1")
(我更熟悉这个 pivot_wider 函数...也许 stats::reshape 也可以用其他一些语法来做到这一点。)
或者,如果你真的想要一个包含所有其他变量的超宽数据框,你可以使用以下内容:
datasetSPSSSMESM_wide3 <- tidyr::pivot_wider(data = datasetSPSSSMESM_2,values_from = c(-ID,-Obs))
希望这会有所帮助!
精萌