识别在数据框中复制的行

问题描述

请在下面查看我正在使用的数据集:

  index d1_t1 d1_t2 d1_t3 d1_t4 d2_t1 d2_t2 d2_t3 d2_t4 d3_t1 d3_t2 d3_t3 d3_t4 d4_t1 d4_t2 d4_t3 d4_t4 d5_t1 d5_t2 d5_t3 d5_t4 d6_t1 d6_t2 d6_t3 d6_t4
   101     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1
   200     1     1     1     1     1     1     0     0     1     1     1     0     1     1     1     1     1     1     1     1     1     1     0     0
   200     1     1     1     0     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1
   101     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1

  d7_t1 d7_t2 d7_t3 d7_t4
    1     1     1     1
    1     1     0     0
    1     1     1     1
    1     1     1     1

变量的简短说明:

d1t1=Day 1 time 1
d1t2=Day 1 time 2
....
d2t1=Day2 time 1
d2t2=Day2 time 2

0,1 =在特定时间进行的不同类型的测量

我想确定一周内测量结果相似的系列

输出:

  index d1_t1 d1_t2 d1_t3 d1_t4 d2_t1 d2_t2 d2_t3 d2_t4 d3_t1 d3_t2 d3_t3 d3_t4 d4_t1 d4_t2 d4_t3 d4_t4 d5_t1 d5_t2 d5_t3 d5_t4 d6_t1 d6_t2 d6_t3 d6_t4 d7_t1 d7_t2 d7_t3 d7_t4
1   101     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1

样本数据:

    df<-structure(list(index=c (101,200,101),d1_t1 = c(1,1,1),d1_t2 = c(1,d1_t3 = c(1,d1_t4 = c(1,d2_t1 = c(1,d2_t2 = c(1,d2_t3 = c(1,d2_t4 =c(1,d3_t1 = c(1,d3_t2 = c(1,d3_t3 = c(1,d3_t4 = c(1,d4_t1 = c(1,d4_t2 = c(1,d4_t3 = c(1,d4_t4 =c(1,d5_t1 = c(1,d5_t2 = c(1,d5_t3 = c(1,d5_t4 = c(1,d6_t1 = c(1,d6_t2 = c(1,d6_t3 = c(1,d6_t4 =c(1,d7_t1 = c(1,d7_t2 = c(1,d7_t3 = c(1,d7_t4 =c(1,1)),row.names = c(NA,4L),class = "data.frame")
                                                            
df

解决方法

一个dplyr选项可能是:

df %>%
 group_by_all() %>%
 filter(n() > 1 & row_number() == 1)
,

一个data.table选项:

library(data.table)

setDT(df)[,.I[.N > 1],by = names(df)]

尽管那样,您还会有一个额外的V1列,您当然可以删除它或这样做:

setDT(df)

df[df[,by = names(df)]$V1,]

如果每个系列只需要一行,则可以将最后一个调用包装到unique中。

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...