基于两列的SQL Lead函数

问题描述

我有一个约有7亿行的表,下面的示例仅包含一个line_id。

LINE_ID|COLLECTION_DATE    |DSL_CARD_TYPE|
-------|-------------------|-------------|
1234567|2020-03-25 08:46:08|ADSL_PORT    |
1234567|2020-03-26 08:31:48|ADSL_PORT    |
1234567|2020-03-27 08:42:40|VDSL_PORT    |
1234567|2020-03-28 08:36:32|VDSL_PORT    |
1234567|2020-03-29 08:31:33|VDSL_PORT    |
1234567|2020-03-30 08:50:15|VDSL_PORT    |
1234567|2020-04-31 08:44:33|ADSL_PORT    |
1234567|2020-03-01 08:34:53|ADSL_PORT    |
1234567|2020-04-02 08:44:11|ADSL_PORT    |
1234567|2020-04-03 08:43:51|VDSL_PORT    |
1234567|2020-04-04 08:54:33|ADSL_PORT    |
1234567|2020-04-05 09:06:47|ADSL_PORT    |
1234567|2020-04-06 09:06:57|VDSL_PORT    |
1234567|2020-04-07 09:13:32|VDSL_PORT    |

我需要将DSL_CARD_TYPE分组并创建一个名为Next_COLLECTION_DATE的新列 获得下一个DSL_CARD_TYPE,如下所示

LINE_ID|COLLECTION_DATE    |Next_COLLECTION_DATE  |DSL_CARD_TYPE|
-------|-------------------|----------------------|-------------|
1234567|2020-03-25 08:46:08|2020-03-26 08:31:48   |ADSL_PORT    |  
1234567|2020-03-27 08:42:40|2020-03-30 08:50:15   |VDSL_PORT    |
1234567|2020-03-31 08:34:53|2020-04-02 08:44:11   |ADSL_PORT    |
1234567|2020-04-03 08:43:51|2020-04-03 08:43:51   |VDSL_PORT    |   
1234567|2020-04-04 08:54:33|2020-04-05 09:06:47   |ADSL_PORT    |  
1234567|2020-04-06 09:06:57|2020-04-07 09:13:32   |VDSL_PORT    | 
  

我创建了一个非常虚拟且复杂的查询来完成这项工作,但是由于海量数据量很大,因此需要花费数小时

COALESCE (lead (COLLECTION_DATE) OVER (PARTITION BY Line_ID ORDER BY COLLECTION_DATE),NOW() )Next_Collection_Date,DSL_CARD_TYPE 
FROM (
SELECT * FROM (
SELECT
    LINE_ID,COLLECTION_DATE,DSL_CARD_TYPE,lead (DSL_CARD_TYPE) OVER (PARTITION BY Line_ID ORDER BY COLLECTION_DATE) To_Sync_Port,lag (DSL_CARD_TYPE) OVER (PARTITION BY Line_ID ORDER BY COLLECTION_DATE) B_Sync_Port
FROM
    ANALYTICS.tmp.V_PORTS_LINE_CARD_DATA_ALL
    WHERE SYNC_PORT <>  TO_SYNC_PORT OR B_Sync_Port IS NULL )abc2```

解决方法

这看起来像是一个空白问题,在这种情况下,最好使用行号的不同来解决:

select line_id,dsl_card_type,min(collection_date),max(collection_date)
from (select v.*,row_number() over (partition by line_id order by collection_date) as seqnum,row_number() over (partition by line_id,dsl_card_type order by collection_date) as seqnum_2
      from ANALYTICS.tmp.V_PORTS_LINE_CARD_DATA_ALL v
      where collection_date >= '2020-07-27 00:00:00'
     ) v
group by line_id,(seqnum - seqnum_2);

解释它的工作原理有些棘手。如果运行子查询,则可以看到两个行号之间的差异如何定义具有相同卡类型的相邻行。

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...