问题描述
我正在尝试使用下表中的数据计算两个相邻会话之间的平均小时数:
user_id | event_timestamp | session_num |
---|---|---|
A | 2021-04-16 10:00:00.000 UTC | 1 |
A | 2021-04-16 11:00:00.000 UTC | 2 |
A | 2021-04-16 13:00:00.000 UTC | 3 |
A | 2021-04-16 16:00:00.000 UTC | 4 |
B | 2021-04-16 12:00:00.000 UTC | 1 |
B | 2021-04-16 14:00:00.000 UTC | 2 |
B | 2021-04-16 19:00:00.000 UTC | 3 |
C | 2021-04-16 10:00:00.000 UTC | 1 |
C | 2021-04-16 17:00:00.000 UTC | 2 |
C | 2021-04-16 18:00:00.000 UTC | 3 |
所以,对于用户 A,我们有
1 hour between session_num = 2 and session_num = 1,2 hours between session_num = 3 and session_num = 2,3 hours between session_num = 4 and session_num = 3.
其他用户也一样:
用户 B2,5
小时;
7,1
小时。
我期望得到的结果应该是这个 date_diff(HOUR) 的算术平均值。
因此,avg(1,2,3,5,7,1)
= 3 小时是两个相邻会话之间的平均时间。
有人知道可以使用什么查询,以便 date_diff 函数仅适用于相邻会话吗?
解决方法
给定用户的平均会话间隔时间最简单地计算为:
select user_id,timestamp_diff(max(event_timestamp),min(event_timestamp),hour) * 1.0 / nullif(count(*) - 1,0)
from t
group by user_id;
也就是说,用户会话之间的平均时间是最大时间戳减去最小时间戳除以会话数减去一。
,试试这个:
with mytable as (
select 'A' as user_id,timestamp '2021-04-16 10:00:00.000' as event_timestamp,1 as session_num union all
select 'A','2021-04-16 11:00:00.000',2 as session_num union all
select 'A','2021-04-16 13:00:00.000',3 as session_num union all
select 'A','2021-04-16 16:00:00.000',4 as session_num union all
select 'B','2021-04-16 12:00:00.000',1 as session_num union all
select 'B','2021-04-16 14:00:00.000',2 as session_num union all
select 'B','2021-04-16 19:00:00.000',3 as session_num union all
select 'C','2021-04-16 10:00:00.000',1 as session_num union all
select 'C','2021-04-16 17:00:00.000',2 as session_num union all
select 'C','2021-04-16 18:00:00.000',3 as session_num
)
select avg(diff) as average
from (
select
user_id,timestamp_diff(event_timestamp,lag(event_timestamp) OVER (partition by user_id order by event_timestamp),hour) as diff
from mytable
)