如何为 COUNT DISTINCT 运行 SUM() OVER PARTITION BY

问题描述

我正在尝试获取每天每个事件的不同用户数量,同时保持每小时的运行总和。 我使用 Athena/Presto 作为查询引擎。

我尝试了以下查询

<ol className="item-list">
  {
    props.items.map((item,index) => (
      <ShoppingItem
        key={index} // <-- key goes here
        item={item}
      />
    ))
  }
</ol>

但是在看到结果后,我意识到取 COUNT disTINCT 的 SUM 是不正确的,因为它不是相加的。

所以,我尝试了以下查询

SELECT
    eventname,date(from_unixtime(time_bucket)) AS date,(time_bucket % 86400)/3600 as hour,count,SUM(count) OVER (PARTITION BY eventname,date(from_unixtime(time_bucket)) ORDER BY eventname,time_bucket) AS running_sum_count
FROM (
    SELECT 
        eventname,CAST(eventtimestamp AS bigint) - CAST(eventtimestamp AS bigint) % 3600 AS time_bucket,COUNT(disTINCT moengageuserid) as count
    FROM clickstream.moengage
    WHERE date = '2020-08-20'
    AND eventname IN ('e1','e2','e3','e4')
    GROUP BY 1,2
    ORDER BY 1,2
);

但此查询失败并出现以下错误

SELECT
    eventname,SUM(COUNT(disTINCT moengageuserid)) OVER (PARTITION BY eventname,time_bucket) AS running_sum
FROM (
    SELECT
        eventname,moengageuserid
    FROM clickstream.moengage
    WHERE date = '2020-08-20'
    AND eventname IN ('e1','e4')
);

解决方法

要计算运行的不同计数,您可以将用户 ID 收集到集合(不同的数组)中并获取大小:

cardinality(set_agg(moengageuserid)) OVER (PARTITION BY eventname,date(from_unixtime(time_bucket)) ORDER BY eventname,time_bucket) AS running_sum

这是解析函数,会为整个分区分配相同的值(事件名称,日期),您可以使用max()等聚合上层子查询中的记录

,

计算用户第一次出现的次数:

SELECT eventname,date(from_unixtime(time_bucket)) AS date,(time_bucket % 86400)/3600 as hour,COUNT(DISTINCT moengageuserid) as hour_cont,SUM(CASE WHEN seqnunm = 1 THEN 1 ELSE 0 END) OVER (PARTITION BY eventname,date(from_unixtime(time_bucket)) ORDER BY time_bucket) AS running_distinct_count
FROM (SELECT eventname,CAST(eventtimestamp AS bigint) - CAST(eventtimestamp AS bigint) % 3600 AS time_bucket,moengageuserid as hour_count,ROW_NUMBER() OVER (PARTITION BY eventname,moengageuserid ORDER BY eventtimestamp) as seqnum
      FROM clickstream.moengage
      WHERE date = '2020-08-20' AND
            eventname IN ('e1','e2','e3','e4')
    ) m
GROUP BY 1,2,3
ORDER BY 1,2;

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...