SQL用先前的非NULL行填充NULL行

问题描述

所以我有一个bq表,它是作为多个日历日期快照创建的,并加入了trx数据。请在下面的查询中填充表格

  SELECT
    GENERATE_DATE_ARRAY(date_add(DATE(CURRENT_TIMESTAMP),interval -20 day),DATE('2020-08-22')) AS date_array
  ),dim_date AS (
  SELECT
    sn_date
  FROM
    date_array_table,UNnesT(date_array) AS sn_date
    ),data_test as (
 select date('2020-08-20') as date,1 as id,1000 as num
 UNION ALL
 select date('2020-08-18') as date,130 as num
 UNION ALL
 select date('2020-08-18') as date,2 as id,300 as num
 UNION ALL
 select date('2020-08-13') as date,250 as num
 ),jjoin as (
 select
 *
 from dim_date
 left join 
 data_test
 on 1=1 and sn_date = date
 )

 select *
 from jjoin
 order by 1 desc

结果如下所示img

enter image description here

一个我想用每个ID的日期前一个非NULL行的NULL值填充快照行。我尝试使用max或first_value,但它仍然为NULL。例子:

select sn_date
coalesce(num,max (num) over (partition by id order by date)
from jjoin

,但不显示先前的非Null行。有什么建议吗?谢谢

预期:

--------------------------
sn_date | date | id | num
--------------------------
08/22   | 08/20| 1  | 1000
08/21   | 08/20| 1  | 1000
08/20   | 08/20| 1  | 1000
08/19   | 08/18| 1  | 130
08/18   | 08/18| 1  | 130
08/18   | 08/18| 2  | 300
08/17   | 08/13| 1  | 250
08/16   | 08/13| 1  | 250
08/15   | 08/13| 1  | 250

解决方法

您可以使用last_value()

select sn_date,date,id,num,last_value(date ignore nulls) over (order by date desc),last_value(id ignore nulls) over (order by date desc),last_value(num ignore nulls) over (order by date desc)

我应该注意,SQL标准支持ignore nulls上的lag()以及first_value()last_value()。当我考虑解决此问题时,我会考虑lag()。我认为BigQuery是唯一支持ignore null而不支持lag()的数据库。

,

以下是用于BigQuery标准SQL

#standardSQL
SELECT sn_date,FIRST_VALUE(date IGNORE NULLS) OVER (win) AS date,FIRST_VALUE(id IGNORE NULLS) OVER (win) AS id,FIRST_VALUE(num IGNORE NULLS) OVER (win) AS num
FROM your_current_result
WINDOW win AS (ORDER BY sn_date DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)

是否适用于您在问题中显示的当前结果,如以下示例所示

#standardSQL
WITH your_current_result AS (
  SELECT DATE '2020-08-20' sn_date,DATE '2020-08-20' date,1 id,1000 num UNION ALL
  SELECT '2020-08-22',NULL,NULL UNION ALL
  SELECT '2020-08-21',NULL UNION ALL
  SELECT '2020-08-19',NULL UNION ALL
  SELECT '2020-08-18','2020-08-18',1,130 UNION ALL
  SELECT '2020-08-18',2,300 UNION ALL
  SELECT '2020-08-17',NULL UNION ALL
  SELECT '2020-08-16',NULL UNION ALL
  SELECT '2020-08-15',NULL UNION ALL
  SELECT '2020-08-14',NULL UNION ALL
  SELECT '2020-08-13','2020-08-13',250 
)
SELECT sn_date,FIRST_VALUE(num IGNORE NULLS) OVER (win) AS num
FROM your_current_result
WINDOW win AS (ORDER BY sn_date DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)

结果是

Row sn_date     date        id  num  
1   2020-08-22  2020-08-20  1   1000     
2   2020-08-21  2020-08-20  1   1000     
3   2020-08-20  2020-08-20  1   1000     
4   2020-08-19  2020-08-18  1   130  
5   2020-08-18  2020-08-18  1   130  
6   2020-08-18  2020-08-18  2   300  
7   2020-08-17  2020-08-13  1   250  
8   2020-08-16  2020-08-13  1   250  
9   2020-08-15  2020-08-13  1   250  
10  2020-08-14  2020-08-13  1   250  
11  2020-08-13  2020-08-13  1   250