问题描述
我有两个表– activity
和purchase
活动表:
user_id date videos_watched
1 2020-01-02 3
1 2020-01-04 5
1 2020-01-07 5
购买表:
user_id purchase_date
1 2020-01-01
2 2020-02-02
我想做的是,自购买观看次数以来,获得30天的滚动平均值。
基本查询如下:
SELECT
DATEDIFF(DAY,p.purchase_date,a.date) AS day_since_purchase,AVG(A.VIDEOS_VIEWED)
FROM PURCHASE P
LEFT OUTER JOIN ACTIVITY A ON P.USER_ID = A.USER_ID AND
A.DATE >= P.PURCHASE_DATE AND A.DATE <= DATEADD(DAY,30,P.PURCHASE_DATE)
GROUP BY 1;
但是,Activity
表仅记录了每天录制视频的记录。我想填空几天没有被观看的视频。
我已经开始考虑使用像这样的CTE
:
WITH cte AS (
SELECT date('2020-01-01') as fdate
UNION ALL
SELECT CAST(DATEADD(day,1,fdate) as date)
FROM cte
WHERE fdate < date('2020-04-01')
) select * from cte
cross join purchases p
left outer join activity a
on p.user id = a.user_id
and a.fdate = p.purchase_date
and a.date >= p.purchase_date and a.date <= dateadd(day,p.purchase_date)
最终目标是拥有这样的东西:
days_since_purchase videos_watched
1 3
2 0 --CTE coalesce inserted value
3 0
4 5
在尝试了最后几个小时以使其正确时,但仍然无法真正掌握它。
解决方法
如果要填补结果集中的空白,那么我认为您应该生成整数而不是日期:
WITH cte AS (
SELECT 1 as day_since_purchase
UNION ALL
SELECT 1 + day_since_purchase
FROM cte
WHERE day_since_purchase < 4
)
SELECT cte.day_since_purchase,COALESCE(avg_videos_viewed,0)
FROM cte LEFT JOIN
(SELECT DATEDIFF(DAY,p.purchase_date,a.date) AS day_since_purchase,AVG(A.VIDEOS_VIEWED) as avg_videos_viewed
FROM purchases p JOIN
activity a
ON p.user id = a.user_id AND
a.fdate = p.purchase_date AND
a.date >= p.purchase_date AND
a.date <= dateadd(day,30,p.purchase_date)
GROUP BY 1
) pa
ON pa.day_since_purchase = cte.day_since_purchase;
,
您可以使用递归查询生成每次购买后的30天,然后携带活动表:
with cte as (
select
purchase_date,client_id,0 days_since_purchase,purchase_date dt
from purchases
union all
select
purchase_date,days_since_purchase + 1
dateadd(day,days_since_purchase + 1,purchase_date)
from cte
where days_since_purchase < 30
)
select
c.days_since_purchase,avg(colaesce(a. videos_watch,0)) avg_ videos_watch
from cte c
left join activity a
on a.client_id = c.client_id
and a.fdate = c.purchase_date
and a.date = c.dt
group by c.days_since_purchase
您的问题尚不清楚,activity
表中是否有一列存储与每一行相关的购买日期。您的查询具有列fdate
,但没有示例数据。我在查询中使用了该列(没有该列,您可能最终会在不同购买中算出相同的活动)。