我先使用GROUPBY,然后使用SUMMARIZE将总计标签添加到数据框但是表示总水平的%数据错误

问题描述

我先使用GROUPBY,然后使用SUM,然后使用SUMMARIZE将总标签添加到数据框。但表示总水平的%数据错误。因此,我想用具有正确结果的计算覆盖百分比变量“缺勤百分比”。问题在于它是一个长数据集,无法手动执行。寻找好的解决方案,LOOP还是其他?

enter image description here

代码:

Date=c("01/09/2020","01/09/2020","02/09/2020","02/09/2020")

Asset=c("Blue Hotel","Blue Hotel","Green Hotel","Green Hotel")

Variable=c("hotel staff","bar staff","absent staff","percentage absent 
   staff","hotel staff","percentage absent staff","percentage absent staff")
value=c(5,10,3,0.2,4,8,2,0.17,5,0.20,6,0.33)

df=data.frame(Date,Asset,Variable,value)

#to create totals
df2= df %>% 
  group_by(Date,Variable) %>%
  summarise(value = sum(as.numeric(value),na.rm=F)) %>% ungroup()

解决方法

我不确定您要什么计算,因为第一个“正确”计算看起来像absent_staff /(hotel_staff + bar_staff + absent_staff),第二个正确计算看起来像absent_staff /(hotel_staff + bar_staff)。但是,您可以根据自己的喜好设计以下解决方案。

df2= df %>% 
  group_by(Date,Variable) %>%
  summarise(value = sum(as.numeric(value),na.rm=F)) %>% 
  ungroup() %>% 
  group_by(Date) %>% 
  mutate(value = case_when(
           Variable == "percentage absent staff" ~ value[which(Variable == "absent staff")]/
                    sum(value[which(Variable %in% c("absent staff","bar staff","hotel staff"))]),TRUE ~ value)
         )
df2
# # A tibble: 8 x 3
# # Groups:   Date [2]
#     Date       Variable                 value
#     <chr>      <chr>                    <dbl>
# 1 01/09/2020 absent staff             5    
# 2 01/09/2020 bar staff               18    
# 3 01/09/2020 hotel staff              9    
# 4 01/09/2020 percentage absent staff  0.156
# 5 02/09/2020 absent staff             6    
# 6 02/09/2020 bar staff               13    
# 7 02/09/2020 hotel staff             11    
# 8 02/09/2020 percentage absent staff  0.2  

在上面,您通过Date对汇总数据进行了分组,然后将值替换为条件表达式。当Variable等于"percentage absent staff"时,该值将是"absent staff"的值除以"absent staff","hotel staff"的值之和。因此,如果您真的想从上面进行第二次计算,则可以将"absent staff"排除在此向量之外。否则,value将返回与原始值相同的值。


编辑

要回答评论中的问题,如果同一变量-Variable中还有其他具有相同结构的常驻值,则可以使用以下项来代替它们:

Date=c("01/09/2020","01/09/2020","02/09/2020","02/09/2020")

Asset=c("Blue Hotel","Blue Hotel","Green Hotel","Green Hotel")

Variable=c("hotel staff","absent staff","percentage absent staff","hotel staff","percentage absent staff")
value=c(5,10,3,0.2,4,8,2,0.17,5,0.20,6,0.33)

df=data.frame(Date,Asset,Variable,value)

#to create totals

dfr <- df
dfr$Variable <- gsub("staff","residents",dfr$Variable)
dfr$value <- rpois(nrow(dfr),25)
df <- bind_rows(df,dfr)
df[c(1:5,17:21),]


df2= df %>% 
  group_by(Date,na.rm=F)) %>% ungroup()



df2a= df2 %>% 
  group_by(Date,Variable) %>% 
  summarise(value = sum(as.numeric(value),na.rm=F)) %>% 
  ungroup() %>% 
  group_by(Date) %>% 
  mutate(value = case_when( Variable == "percentage absent staff" ~ value[which(Variable == "absent staff")]/ 
                              sum(value[which(Variable %in% c("absent staff",Variable == "percentage absent residents" ~ value[which(Variable == "absent residents")]/ 
                              sum(value[which(Variable %in% c("absent residents","bar residents","hotel residents"))]),TRUE ~ value) ) 

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...