查找分区中的第一个非空

问题描述

我正在使用 sql Server。我有一个数据库,其中有一个人和年份(组合创建唯一性),其中另一列(我们称之为已婚状态)具有空值。我想估算这些空值。我认为由于此列通常不会经常更改,因此我将采用该人的下一个非空值,或者如果它位于数据的末尾,则为前一个非空值。例如:

年份 婚姻状况
2001 NULL
2002 NULL
2003 已婚
拉里 2001 单张
拉里 2002 NULL
拉里 2003 NULL
卷曲 2001 单张
卷曲 2002 NULL
卷曲 2003 已婚

Moe 的 null 应更改为 Married,Larry 的 null 应更改为 single,Curly 的 null 应更改为 Married。

我的想法是像这样使用 coalesce 和 over(使用类似的逻辑来选择前面的空值):

select
    Person,Year,coalesce(MaritalStatus) over (partition by Person order by Year rows between current row and unbounded following)
from mytable

它似乎对 coalesce 不起作用。是否有一些简单的方法可以在没有 CTE 或子查询的情况下执行此操作(如果可能,我会尽量避免这种情况,因为它会使下一个人更难以理解)。

编辑: 根据蒂姆的回答,我想我有一些东西:

cte AS (
    SELECT 
        *,ROW_NUMBER() OVER 
            (PARTITION BY Person,CASE WHEN MaritalStatus IS NULL THEN 0 ELSE 1 END
            ORDER BY Year DESC) rn
    FROM mytable
),cte2 as (
SELECT 
    t1.Person,t1.Year,max(t2.rn) as maxrn,min(t3.rn) as minrn
FROM mytable t1
LEFT JOIN cte t2
    ON t2.Person = t1.Person AND
       t2.MaritalStatus IS NOT NULL and
       t1.year<t2.year
LEFT JOIN cte t3
    ON t3.Person = t1.Person AND
       t3.MaritalStatus IS NOT NULL and
       t1.year>t3.year
group by t1.Person,t1.Year
),cte3 as(
    select
        t1.person,t1.year,coalesce(t1.maritalstatus,t4.maritalstatus,t3.maritalstatus) as maritalstatus
    from mytable t1
        left join cte2 t2
            on t1.person=t2.person and
            t1.year=t2.year
        left join cte t3
            on t1.person=t3.person and
            t3.maritalstatus is not null and
            t2.maxrn=t3.rn
        left join cte t4
            on t1.person=t4.person and
            t4.maritalstatus is not null and
            t2.minrn=t4.rn
            
)
select * from cte3

解决方法

我们可以尝试以下方法。在这里,我们应用 ROW_NUMBER 对人的分区, 对婚姻状况值是否为 NULL 进行分区。然后,我们使用每人最近的非 NULL 婚姻状况值来填充任何 NULL 缺失的婚姻状况值。

WITH cte AS (
    SELECT *,ROW_NUMBER() OVER (
                  PARTITION BY Person,CASE WHEN MaritalStatus IS NULL THEN 0 ELSE 1 END
                  ORDER BY Year DESC) rn
    FROM mytable
)

SELECT t1.Person,t1.Year,COALESCE(t1.MaritalStatus,t2.MaritalStatus) AS MaritalStatus
FROM mytable t1
LEFT JOIN cte t2
    ON t2.Person = t1.Person AND
       t2.MaritalStatus IS NOT NULL AND
       t2.rn = 1;

screen capture from demo link below

Demo

,

您可以仅使用窗口函数来执行此操作。关键思想是获得有婚姻状况的第一年。然后在所有行上传播那一年的婚姻状况:

SELECT t.*,MAX(CASE WHEN year = first_year_ms THEN MaritalStatus END) OVER (PARTITION BY person) as first_marital_status
FROM (SELECT t.*,MIN(CASE WHEN MaritalStatus IS NOT NULL THEN year END) OVER (PARTITION BY person) as first_year_ms
      FROM t
     ) t
ORDER BY person,year;

更简单的方法可能是使用横向连接:

select *
from t outer apply
     (select top (1) t2.maritalstatus
      from t t2
      where t2.person = t.person and t2.maritalstatus is not null
      order by t2.year asc
     ) t2;

使用 (person,maritalstatus,year) 上的索引,这可能是最快的方法。

Here 是一个 dbfiddle。

编辑:

你仍然可以用窗口函数来做到这一点:

SELECT t.*,COALESCE(MAX(MaritalStatus) OVER (PARTITION BY person,grp_after),MAX(MaritalStatus) OVER (PARTITION BY person,grp_before)
               ) as next_marital_status
FROM (SELECT t.*,COUNT(MaritalStatus) OVER (PARTITION BY person ORDER BY year DESC) as grp_after,COUNT(MaritalStatus) OVER (PARTITION BY person ORDER BY year ASC) as grp_before
      FROM t
     ) t
ORDER BY person,year;

或者使用apply

select *
from t outer apply
     (select top (1) t2.maritalstatus
      from t t2
      where t2.person = t.person and t2.maritalstatus is not null
      order by (case when t2.year >= t.year then 1 else 2 end),(case when t2.year >= t.year then t2.year end) asc,t2.year desc
     ) t2

Here 是为此的 SQL Fiddle。

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...