删除重复记录,每行 4 行 数据数据

问题描述

我正在尝试从我的数据框中删除重复的元素。

# A tibble: 12 x 3
       g h         i
   <dbl> <chr> <int>
 1     1 a         1
 2     1 b         2
 3     1 c         3
 4     1 d         4
 5     2 a         5
 6     2 b         6
 7     2 c         7
 8     2 d         8
 9     1 a         9
10     1 b        10
11     1 c        11
12     1 d        12

但每个元素都有 4 行。我希望他保持这种状态。

# A tibble: 8 x 3
      g h         i
  <dbl> <chr> <int>
1     1 a         1
2     1 b         2
3     1 c         3
4     1 d         4
5     2 a         5
6     2 b         6
7     2 c         7
8     2 d         8

我尝试过 distinct ()unique(),但没有奏效。

解决方法

我们可以在 distincted 列上使用 select

library(dplyr)
distinct(df1,g,h,.keep_all = TRUE)

-输出

#  g h i
#1 1 a 1
#2 1 b 2
#3 1 c 3
#4 1 d 4
#5 2 a 5
#6 2 b 6
#7 2 c 7
#8 2 d 8

或者用duplicated

df1[!duplicated(df1[c('g','h')]),]

数据

df1 <- structure(list(g = c(1L,1L,2L,1L),h = c("a","b","c","d","a","d"),i = 1:12),class = "data.frame",row.names = c("1","2","3","4","5","6","7","8","9","10","11","12"))
,

另一种选择是在 unique 对象上使用 data.table S3 方法:

library(data.table)

unique(
  data.table(dat),by = c('g','h')
)

#    g h i
# 1: 1 a 1
# 2: 1 b 2
# 3: 1 c 3
# 4: 1 d 4
# 5: 2 a 5
# 6: 2 b 6
# 7: 2 c 7
# 8: 2 d 8

数据

dat <- structure(
  list(
    g = c(1L,i = 1:12
    ),row.names = c(NA,-12L),class = c("tbl_df","tbl","data.frame")
)