问题描述
我有一张这样的桌子:
|uniqueID|scandatetime |scanfacilityname|
+--------+-------------------+----------------+
|12345678|01-01-2020 13:45:12|BALTIMORE |
|12345678|01-02-2020 22:45:12|BALTIMORE |
|12345678|01-04-2020 10:15:12|PHILADELPHIA |
|12345678|01-05-2020 08:45:12| |
我想返回一整行,其中包含uniqueID,scandatetime和最新的scanfacilityname(即,最大scandatetime,其中scanfacilityname不为null)。我已经尝试过以下查询:
SELECT
"uniqueID","max"(CAST("scandatetime" AS timestamp)) "timestamp",COALESCE("scanfacilityname") "scanfacilityname"
FROM
iv_scans_new.scan_data
WHERE (("partition_0" = '2020') AND ("partition_1" IN ('06','07','08'))) and scanfacilityname is not null
group by 1,3
;
但是我不确定这是否正确/我是否需要合并。
解决方法
您可以使用max_by
函数:
select max_by(uniqueID,scanfacilityname),max_by(scandatetime,max(scanfacilityname)
请参见doc。
由于coalesce
和max
函数将有效地忽略max_by
值,因此不需要null
。
一种选择是使用子查询进行过滤:
select s.*
from iv_scans_new.scan_data s
where s.scandatetime = (
select max(s1.scandatetime)
from iv_scans_new.scan_data s1
where s1.uniqueID = s.uniqueID and s1.scanfacilityname is not null
)
您也可以使用row_number()
:
select *
from (
select
s.*,row_number() over(partition by uniqueID order by scandatetime desc) rn
from iv_scans_new.scan_data s
where scanfacilityname is not null
) s
where rn = 1