问题描述
我有一个表(实际上是一个很大的查询,请不要在表上使用联接),如下所示:
date | priority | data
20200301 | 1 | 0.3
20200301 | 2 | 0.4
20200302 | 2 | 0.4
20200302 | 3 | 0.1
20200303 | 1 | 0.8
因此,我希望日期和数据具有每个日期的最低优先级,所以我要查询的结果将是:
date | priority | data
20200301 | 1 | 0.3
20200302 | 2 | 0.4
20200303 | 1 | 0.8
每当我尝试创建group by子句时,该查询都无法检索数据列,也不支持数据列上的其他值。
解决方法
您可以为此使用row_number
窗口函数:
CREATE TABLE t (
"date" INTEGER,"priority" INTEGER,"data" FLOAT
);
INSERT INTO t
("date","priority","data")
VALUES ('20200301','1','0.3'),('20200301','2','0.4'),('20200302','3','0.1'),('20200303','0.8');
SELECT *
FROM (
SELECT *,row_number() OVER (PARTITION BY date ORDER BY priority)
FROM t
) f
WHERE row_number = 1
返回:
+--------+--------+----+----------+
|date |priority|data|row_number|
+--------+--------+----+----------+
|20200301|1 |0.3 |1 |
|20200302|2 |0.4 |1 |
|20200303|1 |0.8 |1 |
+--------+--------+----+----------+
如@david在评论中所提到的,基于“ priority = min_priority_for_date”过滤行可能会更有效(而不是对其进行排名和之后过滤):
SELECT *
FROM t
WHERE (date,priority) IN (
SELECT date,MIN(priority)
FROM t
GROUP BY date
)