问题描述
我有一个数据框,我想用Id
与该组创建2个新变量。
首先,我需要按Id
分组,并按createdDate
得到最近的日期,然后我需要根据最近的日期再次获得Lead_DataSource__c
这是我数据框的尾巴;
tail(df)
Id CreatedDate Lead_DataSource__c StageName
0011000001XW3YZAA1 2020-07-17 Walk in Quotation
0011000001XW3Z8AAL 2020-07-17 Walk in Quotation
0011000001XW3zHAAT 2020-07-17 Walk in Assigned
0011000001XW3zlAAD 2020-07-17 Walk in Quotation
0011000001XW3zvAAD 2020-07-17 Walk in Closed Lost
0011000001XW3zvAAD 2020-07-17 Website Closed Lost
这是我的代码:
df_new<-df %>% group_by(Id)%>%
mutate(numberoflead=length(Id)) %>% #number of lead
mutate(lastcreateddateoflead=max(CreatedDate)) %>%#last date of lead
mutate(lasttouch =max(CreatedDate)[Lead_DataSource__c])%>% #last touch
当我运行这些代码时,我没有收到任何错误,似乎对numberofleads
和lastcreateddateoflead
都有效,但是对于获得lasttouch
似乎却没有效果
解决方法
您的问题是您应该使用mutate
时使用summarize
。然后,您需要加入原始的df
才能获得lasttouch
。如果在联接中添加select
,则只会得到lasttouch
列,而无需重命名或选择任何内容。
library(dplyr)
df %>%
group_by(Id) %>%
summarize(numberoflead = n(),lastcreateddateoflead=max(CreatedDate)) %>%
inner_join(df %>%
select(Id,CreatedDate,lasttouch = Lead_DataSource__c),by = c("Id" = "Id","lastcreateddateoflead" = "CreatedDate"))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 6 x 4
Id numberoflead lastcreateddateoflead lasttouch
<chr> <int> <date> <chr>
1 0011000001XW3YZAA1 1 2020-07-17 Walk in
2 0011000001XW3Z8AAL 1 2020-07-17 Walk in
3 0011000001XW3zHAAT 1 2020-07-17 Walk in
4 0011000001XW3zlAAD 1 2020-07-17 Walk in
5 0011000001XW3zvAAD 2 2020-07-17 Walk in
6 0011000001XW3zvAAD 2 2020-07-17 Website
如果您想保留所有行(而不是每个ID仅包含一个摘要),请使用您的mutate代替我的摘要。
df %>%
group_by(Id) %>%
mutate(numberoflead = n(),"lastcreateddateoflead" = "CreatedDate"))
# A tibble: 8 x 7
# Groups: Id [5]
Id CreatedDate Lead_DataSource_~ StageName numberoflead lastcreateddateofl~ lasttouch
<chr> <date> <chr> <chr> <int> <date> <chr>
1 0011000001XW3~ 2020-07-17 Walk in Quotation 1 2020-07-17 Walk in
2 0011000001XW3~ 2020-07-17 Walk in Quotation 1 2020-07-17 Walk in
3 0011000001XW3~ 2020-07-17 Walk in Assigned 1 2020-07-17 Walk in
4 0011000001XW3~ 2020-07-17 Walk in Quotation 1 2020-07-17 Walk in
5 0011000001XW3~ 2020-07-17 Walk in Closed Lo~ 2 2020-07-17 Walk in
6 0011000001XW3~ 2020-07-17 Walk in Closed Lo~ 2 2020-07-17 Website
7 0011000001XW3~ 2020-07-17 Website Closed Lo~ 2 2020-07-17 Walk in
8 0011000001XW3~ 2020-07-17 Website Closed Lo~ 2 2020-07-17 Website
,
嘿,我想我知道您正在尝试做的事情,但是可能说得不太正确。我相信您想要每个Id
的最大日期,然后每个Lead_DataSource__c
的最大日期,如果那是您想做的,也许尝试:
df_new <- df %>% group_by(Id) %>%
mutate(numberoflead=length(Id)) %>% #number of lead
mutate(lastcreateddateoflead=max(CreatedDate)) %>%
group_by(Lead_DataSource__c) %>%
mutate(lasttouch =max(CreatedDate)) %>%
ungroup()
让我知道您是否试图做到这一点!