按多列汇总,将一列加起来,并保留其他列?根据汇总值创建新列?

问题描述

我有一个销售数据框。我需要通过2列ProductIDDay来汇总df,并汇总来自不同列Amount的每个汇总组的值,以便现在显示总数。我希望保留其他可以分组的列(行之间的值相同),在这种情况下,只需Product。最后的列Store将不会保留,因为值在分组的行中可能会有所不同。但是,我需要添加一列UniqueStores,该列计算具有相同ProductID和Day的每个组的唯一存储量。例如,ID = 1和Day = Monday的第一个组将有1个唯一的商店“ N”,因此值将为1。

我尝试在此处以文本形式草拟表格,但无法正确设置其格式,因此此处显示的是表格在汇总之前的外观:

Table view

我尝试使用group_by + summary和df [,sum,by]进行聚合,但是它们没有保留未作为索引提供的变量。是否有一种解决方法而不必手动插入剩余的每一列?

Final View

谢谢,我希望我能说清楚。

输入值:

df <- data.frame("ProductID" = c(1,1,2,2),"Day"=c("Monday","Monday","Tuesday","Wednesday","Friday","Friday"),"Amount"=c(5,5,3,7,6,9,"Product"=c("Food","Food","Toys","Toys"),"Store"=c("N","N","W","S","S"))

解决方法

我们可以通过name := "sbt-validation" version := "0.1" scalaVersion := "2.12.4" libraryDependencies ++= Seq( "com.github.tototoshi" %% "scala-csv" % "1.3.6","io.netty" % "netty-all" % "4.1.42.Final","org.apache.hive" % "hive-jdbc" % "3.0.0","com.lihaoyi" %% "requests" % "0.6.5","mysql" % "mysql-connector-java" % "8.0.15","org.apache.spark" %% "spark-sql" % "3.0.0","org.apache.spark" %% "spark-hive" % "3.0.0","org.apache.spark" %% "spark-core" % "3.0.0" exclude(name="ch.qos.logback",org="ch.qos.logback") ) dplyr中的summarise的“金额”和sum(“商店”的不同元素的数量)进行分组。 / p>

n_distinct

如果有多个列,并且只想将一部分列作为子集,而保留其余部分,则可以选择在数据集中使用library(dplyr) df %>% group_by(ProductID,Day,Product) %>% summarise(Amount = sum(Amount),UniqueStores = n_distinct(Store),.groups = 'drop') # A tibble: 4 x 5 # ProductID Day Product Amount UniqueStores # <dbl> <chr> <chr> <dbl> <int> #1 1 Monday Food 10 1 #2 1 Tuesday Food 10 2 #3 2 Friday Toys 7 1 #4 2 Wednesday Toys 15 2 ,然后使用mutate获取第一行

distinct
,

data.table中:

library(data.table)

setDT(df)[,.(Amount = sum(Amount,na.rm = TRUE),UniqueStores = uniqueN(Store,na.rm = TRUE)),by = .(ProductID,Product)
          ]

输出:

   ProductID       Day Product Amount UniqueStores
1:         1    Monday    Food     10            1
2:         1   Tuesday    Food     10            2
3:         2 Wednesday    Toys     15            2
4:         2    Friday    Toys      7            1