如何基于事件计算SQL中的客户保留率?

问题描述

我正在尝试创建一个SQL语句,以找出哪个客户没有连续参加三个活动

表1-客户: 客户编号,客户名称

+-------------+---------------+
| Customer ID | Customer Name |
+-------------+---------------+
|          01 | Customer 01   |
|          02 | Customer 02   |
|          03 | Customer 03   |
+-------------+---------------+

表2-事件 事件ID,事件日期,事件名称

+----------------------------------+
| Event ID  Event Date  Event Name |
+----------------------------------+
| 01        01/01/2020  Event 01   |
| 02        01/15/2020  Event 02   |
| 03        02/15/2020  Event 03   |
| 04        03/13/2020  Event 04   |
| 05        05/17/2020  Event 05   |
| 06        06/20/2020  Event 06   |
+----------------------------------+

表3-事件活动 事件ID,客户ID

+----------+-------------+----+
| Event ID | Customer ID |    |
+----------+-------------+----+
|       01 |             | 01 |
|       01 |             | 02 |
|       01 |             | 03 |
|       02 |             | 01 |
|       03 |             | 01 |
|       03 |             | 02 |
|       04 |             | 01 |
|       05 |             | 01 |
|       06 |             | 01 |
|       06 |             | 03 |
+----------+-------------+----+

现在,我正在寻找连续三场未参加活动的客户。

在给定的示例中,这将是客户2和客户3。

我使用了史蒂夫的建议。这里是更新的SQL语句:

drop table if exists dbo.customer;
create table dbo.customer(
  CustID        int not null,CustName      varchar(20) not null);
insert dbo.customer(CustID,CustName) values
(1,'Cust 1'),(2,'Cust 2'),(3,'Cust 3'),(4,'Cust 4'),(5,'Cust 5')
;


drop table if exists dbo.events;
create table dbo.events(
  EventID       int not null,EventDate     date not null,EventName     varchar(20) not null);
insert dbo.events(EventId,EventDate,EventName) values
(1,'2020-01-01','Event 1'),'2020-01-15','Event 2'),'2020-02-15','Event 3'),'2020-03-13','Event 4'),'2020-05-17','Event 5'),(6,'2020-06-20','Event 6');


drop table if exists dbo.eventactivity;
create table dbo.eventactivity(
  EventID       int not null,CustID        int not null);
insert dbo.eventactivity(EventID,CustID) values
(1,1),(1,2),3),4),5),3);
(6,5);

在这里:

;with
events_sorted as (
    select e.*,row_number() over (order by EventDate) seq from dbo.events e),activity_lag as 
(
    select
      a.*,e.seq,lag(e.seq,1,0) over (partition by CustId order by e.seq) lag_seq,iif(lag(e.seq,0) over (partition by CustId order by e.seq)=0,iif((e.seq-lag(e.seq,0) over (partition by CustId order by e.seq))>1,0)) seq_break
    from dbo.eventactivity a
         join events_sorted e on a.EventID=e.EventID
),activity_lag_sum as (
    select
      alag.*,sum(seq_break) over (partition by CustId order by alag.seq) seq_grp
    from
      activity_lag alag
),three_in_a_row_cte as (
    select distinct CustId
    from activity_lag_sum
    group by CustID,seq_grp
    having count(*)>=3
    )
    select * 
from customer c
where not exists(select 1
                 from three_in_a_row_cte r
                 where c.CustID=r.CustID);

问题是,这将返回客户2,客户3,客户4-且客户2确实参加了2个事件,跳过了2个,参加了2个,因此客户2不应该出现在列表中。

有什么建议吗?

解决方法

以下查询返回的CustId如下:1)跳过了3个或更多事件,或2)总共参加了少于3个事件。

;with
events_sorted as (
    select e.*,row_number() over (order by EventDate) seq from #events e),activity_lag as 
(
    select
      a.*,e.seq,lag(e.seq,1,0) over (partition by CustId order by e.seq) lag_seq,iif(lag(e.seq,0) over (partition by CustId order by e.seq)=0,iif((e.seq-lag(e.seq,0) over (partition by CustId order by e.seq))>1,0)) seq_break
    from #eventactivity a
         join events_sorted e on a.EventID=e.EventID
)
select distinct CustId
from activity_lag
where seq-lag_seq>3
union all
select CustId
from activity_lag
group by CustId
having count(*)<3;

结果

CustId
3
4
,

您只需要连续跳过 3 个或更多事件的客户,您可以通过从事件活动表本身查询来获取。请在下面找到查询和查询结果:-

创建表格

     create table event_activity ("event_id" varchar(2),"customer_id" varchar(2))
     insert into event_activity
     values ('01','01'),('01','02'),'03'),('02',('03',('04',('05',('06',('07',('08','04'),('12',('13','05')

以上查询将产生下表:-

  event_id | customerid 
  ---------------------    
      01   |   01
      01   |   02
      01   |   03
      02   |   01
      02   |   03
      03   |   01
      03   |   02
      04   |   01
      04   |   02
      05   |   02
      06   |   01
      06   |   03
      07   |   03
      08   |   04
      12   |   04
      13   |   05
      

从上表中我们可以观察到除了客户 4 和 5 之外的所有客户都连续跳过了少于 3 个的事件。根据您的问题,我们只需要 4 和 5,因为 4 连续跳过了 3 个活动,而 5 只参加了 1 个活动。

PS : - 在这里您可以找到客户 3 也跳过了 3 个事件,但在此之前他已经跳过了 参加了一些活动,没有跳过任何所以,它必须被淘汰。

最终查询

    select c.customer_id
    from
    (
       select customer_id,skipped_count,lag(skipped_count,1) over (partition by customer_id order by event_id) 
              as ref
       from 
          ( 
             select customer_id,event_id,LAG(event_id,1) over (partition by customer_id order by event_id) 
                    as previous_event,(event_id - LAG(event_id,1) over (partition by 
                    customer_id order by event_id)-1) as skipped_count
              from 
                 (
                   select CONVERT(int,event_id) as event_id,CONVERT(int,customer_id) as customer_id 
                          from event_activity
                 )a
            )b
     )c
     join
     (
         select convert(int,customer_id) as customer_id,count(event_id) as count_event
           from event_activity
           group by customer_id
     )d
     on c.customer_id=d.customer_id
     where (skipped_count >=3 and ref is null)
     or count_event = 1
     or (skipped_count >=3 and ref > 2)

输出

    4
    5

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...