问题描述
我试图在看起来像这样的数据框中按字符连接文本行:
df <- data.frame(name = c("KYLE","CARTMAN","RANDY","KYLE","RANDY"),lines = c("Hello","Hello","my name is","Kyle","Cartman","Randy"))
df <- data.table(df)
df
## name lines
## 1 Kyle Hello
## 2 Cartman Hello
## 3 Randy Hello
## 4 Kyle my name is
## 5 Cartman my name is
## 6 Randy my name is
## 7 Kyle Kyle
## 8 Cartman Cartman
## 9 Randy Randy
我想要的数据框应如下所示:
df
## name lines
## 1 Kyle Hello my name is Kyle
## 2 Cartman Hello my name is Cartman
## 3 Randy Hello my name is Randy
经过研究,我在Concatenate rows in a dataframe中找到了解决方案,但是我不知道如何删除重复的行:
df <- df[,newlines := str_c(lines,collapse = " "),by = name]
df
## name lines
## 1 Kyle Hello my name is Kyle
## 2 Cartman Hello my name is Cartman
## 3 Randy Hello my name is Randy
## 4 Kyle Hello my name is Kyle
## 5 Cartman Hello my name is Cartman
## 6 Randy Hello my name is Randy
## 7 Kyle Hello my name is Kyle
## 8 Cartman Hello my name is Cartman
## 9 Randy Hello my name is Randy
也许还有其他连接行的方法,这样我就可以避免数据帧中的重复项?
解决方法
我们需要总结而不是分配(:=
)列
library(data.table)
df[,.(lines = paste(lines,collapse=" ")),name]