根据数据框中两个不同列之间的匹配来删除行

问题描述

我有一个数据框,其中包含来自多个渠道的每日渠道收入。数据框如下所示:

orders_dataframe:

    Order |Channel | Revenue |
    1     |TV      | 120     |
    2     |Email   | 30      |
    3     |Retail  | 300     |
    4     |Shop1   | 50      |
    5     |Shop2   | 90      |
    6     |Email   | 20      |
    7     |Retail  | 250     |

我想做的是将来自零售的收入分成几部分,并按照预定义的比例(例如60%/ 40%的分成)在Shop1和Shop2之间进行分配。例如,我希望收入来自“零售”的所有行都归因于Shop1 60%和Shop2 40%。可以通过将所有零售收入行替换为两个新行来体现这一点,正如我要在最终表中获得的最终表中的订单3和订单7所示:

orders_dataframe:  

    Order |Channel | Revenue |
    1     |TV      | 120     |
    2     |Email   | 30      |
    3     |Shop1   | 180     |
    3     |Shop2   | 120     |
    4     |Shop1   | 50      |
    5     |Shop2   | 90      |
    6     |Email   | 20      |
    7     |Shop1   | 150     |
    7     |Shop2   | 100     |

理想情况下,由于我要对各种数据集执行此操作,因此我想从数据帧(split_dataframe)中获取百分比,而不是手动分配数字60%和40%。我想使用如下数据集中的数据:

split_dataframe:
    Channel |Percent  |
    Shop1   |60%      | 
    Shop2   |40%      | 

这是两个数据帧的可复制示例:

orders_dataframe <- data.frame(Order = c(1,2,3,4,5,6,7),Channel = c("TV","Email","Retail","Shop1","Shop2","Retail"),Revenue = c(120,30,300,50,90,20,250))

split_dataframe <- data.frame(Channel = c("Shop1","Shop2"),Percent = c(0.6,0.4))

非常感谢您!

解决方法

使用dplyr

split_dataframe  %>% 
mutate(Index="Retail") %>%
merge(.,orders_dataframe,by.x="Index",by.y="Channel") %>%
mutate(Revenue=Revenue*Percent) %>%
select(Order,Channel,Revenue) %>%
bind_rows(orders_dataframe %>% filter(Channel !="Retail"),.)%>%
arrange(.,Order)

给予

  Order Channel Revenue
1     1      TV     120
2     2   Email      30
3     3   Shop1     180
4     3   Shop2     120
5     4   Shop1      50
6     5   Shop2      90
7     6   Email      20
8     7   Shop1     150
9     7   Shop2     100
,

这是一种data.table的方法...请参见代码中的注释以获取解释

library( data.table )
#make them data.tables
setDT( orders_dataframe ); setDT( split_dataframe )
#split to retail en non-retail orders
orders_retail    <- orders_dataframe[ Channel == "Retail",]
orders_no_retail <- orders_dataframe[ !Channel == "Retail",]
#divide the retail orders over the two shops (multiple steps)
#create a new colum by shop
shop_cols <- split_dataframe$Channel
orders_retail[,(shop_cols) := Revenue ]
#melt to long format
orders_retail.melt <- melt( orders_retail,id.vars = "Order",measure.vars = (shop_cols),variable.name = "Channel",value.name = "Revenue")
#and update the molten data with the percentages in the split_dataframe
orders_retail.melt[ split_dataframe,Revenue := Revenue * i.Percent,on = .( Channel )]
#merge everything back together and order on Order id
ans <- rbind( orders_no_retail,orders_retail.melt )
setorder( ans,Order )
#    Order Channel Revenue
# 1:     1      TV     120
# 2:     2   Email      30
# 3:     3   Shop1     180
# 4:     3   Shop2     120
# 5:     4   Shop1      50
# 6:     5   Shop2      90
# 7:     6   Email      20
# 8:     7   Shop1     150
# 9:     7   Shop2     100
,

您可以在基数R中执行此操作。

{
  "extends": "./tsconfig.base.json","compilerOptions": {
    "outDir": "./out-tsc/app","types": ["node"]
  },"files": ["src/main.ts","src/polyfills.ts"],"include": ["src/**/*.d.ts"]
}

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...