如何将 Bigquery 中的 Cross Join 和 String Agg 与日期时间差结合起来

问题描述

我正在尝试从下表中进行

| user_id | touch     | Date       | Purchase Amount
| 1       | Impression| 2020-09-12 |0
| 1       | Impression| 2020-10-12 |0
| 1       | Purchase  | 2020-10-13 |125$
| 1       | Email     | 2020-10-14 |0
| 1       | Impression| 2020-10-15 |0
| 1       | Purchase  | 2020-10-30 |122
| 2       | Impression| 2020-10-15 |0
| 2       | Impression| 2020-10-16 |0
| 2       | Email     | 2020-10-17 |0

| user_id | path                           | Number of days between First  Touch and Purchase | Purchase Amount
| 1       | Impression,Impression,Purchase | 2020-10-13(Purchase) - 2020-09-12 (Impression) |125$
| 1       |  Email,Purchase    | 2020-10-30(Purchase) - 2020-10-14(Email) | 122$
| 2       | Impression,Email  | 2020-12-31 (Fixed date) - 2020-10-15(Impression) | 0$

本质上,每次在逗号分隔的字符串中遇到“购买”时,我都试图为表中的每个唯一用户创建一个新行。

此外,请计算每个唯一用户的首次接触和首次购买之间的差异。创建新行时,我们对同一个用户执行相同的操作,如上例所示。

从我收集的小数据来看,我需要混合使用交叉连接和字符串 agg,但我尝试在字符串 agg 中使用 case 语句,但无法获得所需的结果。

sql (Bigquery) 中是否有更好的方法来做到这一点。

谢谢

解决方法

意味着如果有购买联系,您需要分行的解决方案。

使用以下查询:

Select user_id,Aggregation function according to your requirement,Sum(purchase_amount)
  From
(Select t.*,Sum(case when touch = 'Purchase' then 1 else 0 end) over (partition by user_id order by date) as sm
  From t) t
Group by user_id,sm
,

我们可以将此视为一个缺口和岛屿问题,每个岛屿都以购买结束。我们如何定义组?通过计算我们提前购买的数量(包括当前行) - 因此在查询中使用降序排序。

select user_id,string_agg(touch order by date),min(date) as first_date,max(date) as max_date,date_diff(max(date),min(date)) as cnt_days
from (
    select t.*,countif(touch = 'Purchase') over(partition by user_id order by date desc) as grp
    from mytable t
) t
group by user_id,grp
,

以下是 BigQuery 标准 SQL

#standardSQL
select user_id,string_agg(touch order by date) path,min(date),day) days,sum(amount) amount
from (
  select user_id,touch,date,amount,countif(touch = 'Purchase') over win grp
  from `project.dataset.table`
  window win as (partition by user_id order by date rows between unbounded preceding and 1 preceding)
)
group by user_id,grp    

如果适用于您问题中的样本数据 - 输出为

enter image description here

另一个变化,如果触摸中没有购买,我们从我们设置的固定窗口计算天数。如何将其添加到上面的查询中?

select user_id,date_diff(if(countif(touch = 'Purchase') = 0,'2020-12-31',max(date)),grp    

带输出

enter image description here

,

您可以为每一行创建一个值,该值对应于 table.touch = 'Purchase' 的实例数,然后可用于分组:

with r as (select row_number() over(order by t1.user_id) rid,t1.* from table t1)
select t3.user_id,group_concat(t3.touch),sum(t3.amount),date_diff(max(t3.date),min(t3.date))
from (select 
       (select sum(r1.touch = 'Purchase' AND r1.rid < r2.rid) from r r1) c1,r2.* from r r2
    ) t3 
group by t3.c1;