问题描述
我第一次在R中使用nest / unnest函数,但我不理解结果。我嵌套并立即嵌套,并比较之前/之后的数据帧。为什么数据帧不相同?
> library(tidyverse)
> concentration_original <- readRDS("./Data/concentration.Rds")
> print(concentration_original,n=15)
# A tibble: 12 x 5
SUBJID WT DOSE TIME CONC
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 79.6 4.02 0 0.74
2 1 79.6 4.02 0.25 2.84
3 1 79.6 4.02 0.570 6.57
4 1 79.6 4.02 1.12 10.5
5 1 79.6 4.02 2.02 9.66
6 1 79.6 4.02 3.82 8.58
7 2 72.4 4.4 0 0
8 2 72.4 4.4 0.27 1.72
9 2 72.4 4.4 0.52 7.91
10 2 72.4 4.4 1 8.31
11 2 72.4 4.4 1.92 8.33
12 2 72.4 4.4 3.5 6.85
>
> concentration_nested <- concentration_original %>% nest(data = c(TIME,CONC))
> concentration_nested
# A tibble: 2 x 4
SUBJID WT DOSE data
<dbl> <dbl> <dbl> <list>
1 1 79.6 4.02 <tibble [6 × 2]>
2 2 72.4 4.4 <tibble [6 × 2]>
>
> concentration_unnested <- unnest(concentration_nested,cols = c(data))
> print(concentration_unnested,n=15)
# A tibble: 12 x 5
SUBJID WT DOSE TIME CONC
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 79.6 4.02 0 0.74
2 1 79.6 4.02 0.25 2.84
3 1 79.6 4.02 0.570 6.57
4 1 79.6 4.02 1.12 10.5
5 1 79.6 4.02 2.02 9.66
6 1 79.6 4.02 3.82 8.58
7 2 72.4 4.4 0 0
8 2 72.4 4.4 0.27 1.72
9 2 72.4 4.4 0.52 7.91
10 2 72.4 4.4 1 8.31
11 2 72.4 4.4 1.92 8.33
12 2 72.4 4.4 3.5 6.85
>
> if (identical(concentration_unnested,concentration_original)) {
+ print("After nest/unnest,we have a dataframe which IS IDENTICAL to the original")
+ } else {
+ print("After nest/unnest,we have a dataframe which IS NOT IDENTICAL to the original")
+ }
[1] "After nest/unnest,we have a dataframe which IS NOT IDENTICAL to the original"
>
> all.equal(concentration_unnested,concentration_original)
[1] "Attributes: < Length mismatch: comparison on first 2 components >"
>
请注意,我使用的是 all.equal ,以查看问题可能与属性有关。如果我改用 all_equal ,则结果为TRUE,但我仍然坚持使用 identical 函数,说数据帧不相同。感谢您的帮助!
添加了原始df和嵌套/未嵌套df的dput。
> dput(concentration_original)
structure(list(SUBJID = c(1,1,2,2),WT = c(79.6,79.6,72.4,72.4),DOSE = c(4.02,4.02,4.4,4.4),TIME = c(0,0.25,0.57,1.12,2.02,3.82,0.27,0.52,1.92,3.5),CONC = c(0.74,2.84,6.57,10.5,9.66,8.58,1.72,7.91,8.31,8.33,6.85)),spec = structure(list(cols = list(SUBJID = structure(list(),class = c("collector_double","collector")),WT = structure(list(),DOSE = structure(list(),TIME = structure(list(),CONC = structure(list(),"collector"))),default = structure(list(),class = c("collector_guess",skip = 1),class = "col_spec"),row.names = c(NA,-12L),class = c("tbl_df","tbl","data.frame"))
> dput(concentration_unnested)
structure(list(SUBJID = c(1,"data.frame"))
>
其他信息: 我想我找到了问题。有关原始小标题的spec = info包含与何时使用read_csv创建小标题相关的信息。当小标题通过嵌套/嵌套转换时,spec = info已被丢弃。还有另一个线程提到spec = info与小标题的内容不同步:Remove attributes from data read in readr::read_csv。在这种情况下,他们建议删除spec =属性:
attr(df,'spec') <- NULL
解决方法
根据我的发现,您的原始数据帧与输出不相同的原因是原始数据帧属于col_spec
类,而输出却不是。
使用新的waldo
程序包(属于tidyverse
的一部分),我运行了以下程序:
compare(df,df %>% nest(data = c(TIME,CONC)) %>% unnest(cols = c(data)))
`attr(old,'spec')` is an S3 object of class <col_spec>
`attr(new,'spec')` is absent
似乎您使用readr
读取了数据,结果df是col_spec
类的对象。嵌套原始df会删除此属性。
attr(df %>% nest(data = c(TIME,CONC)),'spec')
NULL
因此,当您unnest
时,df并不相同。