问题描述
表data
+-----+----------------+--------+----------------+
| ID | required_by | Name | Another_Field |
+-----+----------------+--------+----------------+
| 1 | 7 August | cat | X |
| 2 | 7 August | cat | Y |
| 3 | 10 August | cat | Z |
| 4 | 11 August | dog | A |
+-----+----------------+--------+----------------+
我要按名称分组,然后为每个分组选择日期最早的行之一。
对于此数据集,我想以第1行和第4行或第2行和第4行结束。
预期结果:
+-----+----------------+--------+----------------+
| ID | required_by | Name | Another_Field |
+-----+----------------+--------+----------------+
| 1 | 7 August | cat | X |
| 4 | 11 August | dog | A |
+-----+----------------+--------+----------------+
OR
+-----+----------------+--------+----------------+
| ID | required_by | Name | Another_Field |
+-----+----------------+--------+----------------+
| 2 | 7 August | cat | Y |
| 4 | 11 August | dog | A |
+-----+----------------+--------+----------------+
我有返回1,2和4的东西,但是我不确定如何只从第一组中选择一个来获得所需的结果。我正在使用data
表加入分组,以便在分组后可以重新获得ID
和another_field
。
SELECT d.id,d.name,d.required_by,d.another_field
FROM
(
SELECT min(required_by) as min_date,name
FROM data
GROUP BY name
) agg
INNER JOIN
data d
on d.required_by = agg.min_date AND d.name = agg.name
解决方法
通常使用窗口函数解决此问题:
select d.id,d.name,d.required_by,d.another_field
from (
select id,name,required_by,another_field,row_number() over (partition by name order by required_by) as rn
from data
) d
where d.rn = 1;
在Postgres中,使用distinct on()
通常更快:
select distinct on (name) *
from data
order by name,required_by
,
SELECT [id],[date],[name]
FROM [test].[dbo].[data]
WHERE date IN (SELECT min(date) FROM data GROUP BY name)