SQL根据其他非空列查找最大日期

问题描述

我有一张这样的桌子:

|uniqueID|scandatetime       |scanfacilityname|
+--------+-------------------+----------------+
|12345678|01-01-2020 13:45:12|BALTIMORE       |
|12345678|01-02-2020 22:45:12|BALTIMORE       |
|12345678|01-04-2020 10:15:12|PHILADELPHIA    |
|12345678|01-05-2020 08:45:12|                |

我想返回一整行,其中包含uniqueID,scandatetime和最新的scanfacilityname(即,最大scandatetime,其中scanfacilityname不为null)。我已经尝试过以下查询

SELECT
"uniqueID","max"(CAST("scandatetime" AS timestamp)) "timestamp",COALESCE("scanfacilityname") "scanfacilityname"
FROM
iv_scans_new.scan_data
WHERE (("partition_0" = '2020') AND ("partition_1" IN ('06','07','08'))) and  scanfacilityname is not null
group by 1,3
;

但是我不确定这是否正确/我是否需要合并。

解决方法

您可以使用max_by函数:

select max_by(uniqueID,scanfacilityname),max_by(scandatetime,max(scanfacilityname)

请参见doc

由于coalescemax函数将有效地忽略max_by值,因此不需要null

,

一种选择是使用子查询进行过滤:

select s.*
from iv_scans_new.scan_data s
where s.scandatetime = (
    select max(s1.scandatetime)
    from iv_scans_new.scan_data s1
    where s1.uniqueID = s.uniqueID and s1.scanfacilityname is not null
)

您也可以使用row_number()

select *
from (
    select 
        s.*,row_number() over(partition by uniqueID order by scandatetime desc) rn
    from iv_scans_new.scan_data s
    where scanfacilityname is not null
) s
where rn = 1