问题描述
我正在尝试从我的数据框中删除重复的元素。
# A tibble: 12 x 3
g h i
<dbl> <chr> <int>
1 1 a 1
2 1 b 2
3 1 c 3
4 1 d 4
5 2 a 5
6 2 b 6
7 2 c 7
8 2 d 8
9 1 a 9
10 1 b 10
11 1 c 11
12 1 d 12
但每个元素都有 4 行。我希望他保持这种状态。
# A tibble: 8 x 3
g h i
<dbl> <chr> <int>
1 1 a 1
2 1 b 2
3 1 c 3
4 1 d 4
5 2 a 5
6 2 b 6
7 2 c 7
8 2 d 8
我尝试过 distinct ()
或 unique()
,但没有奏效。
解决方法
我们可以在 distinct
ed 列上使用 select
library(dplyr)
distinct(df1,g,h,.keep_all = TRUE)
-输出
# g h i
#1 1 a 1
#2 1 b 2
#3 1 c 3
#4 1 d 4
#5 2 a 5
#6 2 b 6
#7 2 c 7
#8 2 d 8
或者用duplicated
df1[!duplicated(df1[c('g','h')]),]
数据
df1 <- structure(list(g = c(1L,1L,2L,1L),h = c("a","b","c","d","a","d"),i = 1:12),class = "data.frame",row.names = c("1","2","3","4","5","6","7","8","9","10","11","12"))
,
另一种选择是在 unique
对象上使用 data.table
S3 方法:
library(data.table)
unique(
data.table(dat),by = c('g','h')
)
# g h i
# 1: 1 a 1
# 2: 1 b 2
# 3: 1 c 3
# 4: 1 d 4
# 5: 2 a 5
# 6: 2 b 6
# 7: 2 c 7
# 8: 2 d 8
数据
dat <- structure(
list(
g = c(1L,i = 1:12
),row.names = c(NA,-12L),class = c("tbl_df","tbl","data.frame")
)