按组分组的{/ 44}起始日期/结束日期的时间间隔和带有无效记录的岛屿

问题描述

我通常能够搜索并找到适合我的情况的解决方案,但是我没有看到任何适合我的情况的空白和孤岛问题。

我有一个SCD Dim表,其中包含Type 2 Project数据。当类型2维度发生更改时,现有项目记录将被关闭,并创建新的项目记录。关闭项目记录时,将使用当前日期/时间填充RowEndDateTime列,并且将RowIsCurrent标志设置为0。从关闭的记录中,使用与RowStartDateTime值相同的RowEndDateTime值创建新记录。 RowIsCurrent标志设置为1。

我刚刚发现该表包含一些本不应该存在的错误记录,并且没有根本原因,尽管我猜想在对其ADF管道/数据流不存在的相关表进行修复时可能已经发生了无法正确关闭记录。无论如何,我都需要标识并删除无效的行,并更新引用了无效ProjectKeys的任何其他表,以使用正确的ProjectKeys。

我整理了一个几乎可以满足我需要的查询,但是,如果在一组有效记录之间存在多于一行的无效记录,则该查询将无法正常工作。

以下是测试数据:

drop table if exists #Temp;
create table #Temp (PK int,ProjectID varchar(20),RowStartDateTime datetime2(3),RowEndDateTime datetime2(3),RowIsCurrent int,RowNum int);
insert into #Temp
select *,ROW_NUMBER() OVER(partition by ProjectID order by RowStartDateTime,isnull(RowEndDateTime,'2099-12-31')) as RowNum
from (select 596538 as PK,'131789' as ProjectID,'1900-01-01 00:00:00.000' as RowStartDateTime,'2020-05-06 07:14:21.451' as RowEndDateTime,0 as RowIsCurrent union
      select 601293,'131789','2020-05-05 07:14:40.828','2020-05-22 07:07:00.083',0 union
      select 601424,'2020-05-06 07:14:21.451',0 union
      select 603545,0 union
      select 603546,NULL,1 union
      select 601443,'192105',1 union
      select 601300,0 union
      select 484832,'2020-02-11 09:45:15.112',0 union
      select 483736,'2020-01-31 07:48:21.447',0 union
      select 482418,'1900-01-01 00:00:00.000',0 union
      select 662565,'201427','2020-08-25 09:34:57.674',1 union
      select 641261,'2020-07-26 08:36:18.325',0 union
      select 620787,'2020-07-25 08:41:00.695',0 union
      select 601433,0 union
      select 601295,0 union
      select 601292,0 union
      select 601445,'202248',1 union
      select 601401,'2020-04-30 00:04:32.000',0 union
      select 601298,0 union
      select 601297,0 union
      select 597910,'2020-04-19 08:14:52.111',0 union
      select 587915,0) vals;

这是我当前的查询

select *,case when RowStartDateTime = '1900-01-01' then 'Keep 1' 
            else case when RowStartDateTime = RowEndDateTime then 'Delete 1'
                else case when RowStartDateTime = LAG(RowEndDateTime,1) OVER (PARTITION BY ProjectID ORDER BY RowStartDateTime) and
                                                  LAG(RowEndDateTime,1) OVER (PARTITION BY ProjectID ORDER BY RowStartDateTime) != 
                                                  LAG(RowStartDateTime,1) OVER (PARTITION BY ProjectID ORDER BY RowStartDateTime) then 'Keep 2'
                        when RowStartDateTime = LAG(RowEndDateTime,2) OVER (PARTITION BY ProjectID ORDER BY RowStartDateTime) then 'Keep 3'
                        else 'Delete 2' end end 
        end AS KeepOrDeleteRow
from #Temp 
order by ProjectID,RowNum desc

您可以看到初始项目记录的RowStartDateTime为1900-01-01,而当前记录的NULL RowEndDateTime以及RowIsCurrent =1。所有有效记录应具有连续的RowStart和RowEnd日期值,例如:

PK      ProjectID   RowStartDateTime            RowEndDateTime          RowIsCurrent    RowNum
======  =========   ================            ==============          ============    ======
601445  202248      2020-05-06 07:14:21.451     NULL                        1               4
601401  202248      2020-04-30 00:04:32.000     2020-05-06 07:14:21.451     0               3
597910  202248      2020-04-19 08:14:52.111     2020-04-30 00:04:32.000     0               2
587915  202248      1900-01-01 00:00:00.000     2020-04-19 08:14:52.111     0               1

问题在于,如果在有效记录之间存在多个无效记录,则由于LAG函数具有硬编码的增量(1和2),因此KeepOrDeleteRow逻辑将失败。如果您查看ProjectID 202248的记录,在@底部下方,您会看到RowNums 1-5是正确的,但是RowNum 6应该是“ Keep”。结果如下:

PK      ProjectID   RowStartDateTime            RowEndDateTime          RowIsCurrent    RowNum  KeepOrDeleteRow
======  =========   ================            ==============          ============    ======  ===============
603546  131789      2020-05-22 07:07:00.083     NULL                        1               5       Keep 3
603545  131789      2020-05-22 07:07:00.083     2020-05-22 07:07:00.083     0               4       Delete 1
601424  131789      2020-05-06 07:14:21.451     2020-05-22 07:07:00.083     0               3       Keep 3
601293  131789      2020-05-05 07:14:40.828     2020-05-22 07:07:00.083     0               2       Delete 2
596538  131789      1900-01-01 00:00:00.000     2020-05-06 07:14:21.451     0               1       Keep 1
601443  192105      2020-05-06 07:14:21.451     NULL                        1               5       Keep 3
601300  192105      2020-05-05 07:14:40.828     2020-05-05 07:14:40.828     0               4       Delete 1
484832  192105      2020-02-11 09:45:15.112     2020-05-06 07:14:21.451     0               3       Keep 2
483736  192105      2020-01-31 07:48:21.447     2020-02-11 09:45:15.112     0               2       Keep 2
482418  192105      1900-01-01 00:00:00.000     2020-01-31 07:48:21.447     0               1       Keep 1
662565  201427      2020-08-25 09:34:57.674     NULL                        1               6       Keep 2
641261  201427      2020-07-26 08:36:18.325     2020-08-25 09:34:57.674     0               5       Keep 2
620787  201427      2020-07-25 08:41:00.695     2020-07-26 08:36:18.325     0               4       Keep 2
601433  201427      2020-05-06 07:14:21.451     2020-07-25 08:41:00.695     0               3       Keep 3
601295  201427      2020-05-05 07:14:40.828     2020-05-05 07:14:40.828     0               2       Delete 1
601292  201427      1900-01-01 00:00:00.000     2020-05-06 07:14:21.451     0               1       Keep 1
601445  202248      2020-05-06 07:14:21.451     NULL                        1               6       Delete 2
601298  202248      2020-05-05 07:14:40.828     2020-05-05 07:14:40.828     0               5       Delete 1
601297  202248      2020-05-05 07:14:40.828     2020-05-05 07:14:40.828     0               4       Delete 1
601401  202248      2020-04-30 00:04:32.000     2020-05-06 07:14:21.451     0               3       Keep 2
597910  202248      2020-04-19 08:14:52.111     2020-04-30 00:04:32.000     0               2       Keep 2
587915  202248      1900-01-01 00:00:00.000     2020-04-19 08:14:52.111     0               1       Keep 1

我希望有人能够提供一种不需要硬编码值即可工作的更优雅,动态的解决方案。

问题是,什么是给我所需结果的更好(准确)的方法

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)