问题描述
我想将总计添加到我的数据框中,但是遇到了困难,因为数据非常混乱(和以往一样!)-一些列是文本,一些日期,一些数字。我无法发布实际数据,因为它很敏感,但我将显示一个具有相同结构的代表性示例(下图-所需的列为黄色。我一直在尝试使用dplyr和管道进行此操作,但是由于文字和数字的混合。...
数据:
date <- c("17/08/2020","17/08/2020","18/08/2020","18/08/2020")
type <- c("type A","type B","type A","type B")
location <- c("USA","USA","India","India")
value <- c("10","10","frak","15","open","open")
df <- data.frame(date,type,location,value)
基本上,我需要按日期,类型和位置进行汇总。 enter image description here
解决方法
不确定这就是你要的吗。
df %>%
group_by(date,type = "total_type",location) %>%
summarise("value" = sum(as.numeric(value),na.rm = F)) %>%
mutate(value = as.character(value)) %>%
bind_rows(df)
# A tibble: 12 x 4
# Groups: date,type [6]
date type location value
<chr> <chr> <chr> <chr>
1 17/08/2020 total_type India NA
2 17/08/2020 total_type USA 20
3 18/08/2020 total_type India NA
4 18/08/2020 total_type USA 30
5 17/08/2020 type A USA 10
6 17/08/2020 type B USA 10
7 17/08/2020 type A India frak
8 17/08/2020 type B India frak
9 18/08/2020 type A USA 15
10 18/08/2020 type B USA 15
11 18/08/2020 type A India open
12 18/08/2020 type B India open
按除value
以外的所有列进行分组可复制原始表,并且在图像中,汇总行的类型为total_type
。另一方面,图像中所有已汇总的行都具有USA
位置,这也没有意义,因此我照原样进行设置。
我建议使用下一种方法,该方法也类似于@Humpelstielzchen提出的方法,该方法与您在图片中显示的方法很接近:
library(dplyr)
df %>% bind_rows(df %>% group_by(date,location) %>%
mutate(value=as.numeric(value)) %>%
summarise(value=sum(value,na.rm=F)) %>%
mutate(type='total type',value=as.character(value)))
输出:
date type location value
1 17/08/2020 type A USA 10
2 17/08/2020 type B USA 10
3 17/08/2020 type A India frak
4 17/08/2020 type B India frak
5 18/08/2020 type A USA 15
6 18/08/2020 type B USA 15
7 18/08/2020 type A India open
8 18/08/2020 type B India open
9 17/08/2020 total type India <NA>
10 17/08/2020 total type USA 20
11 18/08/2020 total type India <NA>
12 18/08/2020 total type USA 30
更新:这里的方法可能会因为OP版本的软件包而发出:
library(dplyr)
#Data
date <- c("17/08/2020","17/08/2020","18/08/2020","18/08/2020")
type <- c("type A","type B","type A","type B")
location <- c("USA","USA","India","India")
value <- c("10","10","frak","15","open","open")
df <- data.frame(date,type,location,value,stringsAsFactors = F)
#Mutate for summary
df1 <- df %>% group_by(date,location) %>%
mutate(value=as.numeric(value)) %>%
summarise(value=sum(value,na.rm=F)) %>%
mutate(type='total type') %>% ungroup()
df1$value <- as.character(df1$value)
#Bind
df2 <- rbind(df,df1)
输出:
date type location value
1 17/08/2020 type A USA 10
2 17/08/2020 type B USA 10
3 17/08/2020 type A India frak
4 17/08/2020 type B India frak
5 18/08/2020 type A USA 15
6 18/08/2020 type B USA 15
7 18/08/2020 type A India open
8 18/08/2020 type B India open
9 17/08/2020 total type India <NA>
10 17/08/2020 total type USA 20
11 18/08/2020 total type India <NA>
12 18/08/2020 total type USA 30