从表单 Last,First,Middle,Suffix 解析名称组件

问题描述

现在,我有一个全名列,其中全名没有规范化形式。该形式通常遵循 Last,First Middle,Suffix 但它不是所有行的原版。一些示例表单包括

期望的结果是每个组件都在它自己的列中,不存在的组件为 NULL。以下是一些期望结果的示例。

FULLNAME                    FirsTNAME  MIDDLENAME LASTNAME  SUFFIX
Johnson,John Johnny,Jr.     John       Johnny     Johnson   Jr
Anderson,Andrew A,Sr.      Andrew     A          Anderson  Sr
Smith,Smitty Jr.            Smitty     NULL       Smith     Jr
Abegnale,Frank              Frank      NULL       Abegnale  NULL
Henry,King III              King       NULL       Henry     III
Garcia,Jerome John          Jerome     John       Garcia    NULL
     

我目前的解决方案是这样的:

SELECT
FullName,SUBSTRING(FullNM,1,CHARINDEX(',',FullNM) - 1) AS LastName,CASE
      WHEN LEN(SUBSTRING(FullNM,FullNM)+ 1,99)) - LEN(REPLACE(SUBSTRING(FullNM,99),' ','')) > 0
      THEN REPLACE(SUBSTRING(FullNM,LEN(FullNM) - CHARINDEX(' ',REVERSE(FullNM))+1,'.','')
      ELSE NULL
    END AS MiddleName,FullNM) + 1,'')) > 0
      THEN SUBSTRING(FullNM,(LEN(SUBSTRING(FullNM,99)) - LEN(SUBSTRING(FullNM,REVERSE(FullNM)) + 1,99))))
      ELSE SUBSTRING(FullNM,99)
    END AS FirstNM
FROM MyTable

不幸的是,我只是想不出格式化后缀的好方法,尤其是在没有中间名的情况下。使用当前代码,如果有后缀,则将其添加为 MiddleNM。

非常感谢任何帮助或建议!

解决方法

请尝试以下解决方案。它使用 XQuery 来标记 FullName 列。

正如 Gordon 提到的,您可能需要扩展合法后缀列表。

SQL

-- DDL and sample data population,start
DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY,FullName VARCHAR(100));
INSERT INTO @tbl (FullName) VALUES
('Johnson,John Johnny,Jr.'),('Anderson,Andrew A,Sr.'),('Smith,Smitty Jr.'),('Abegnale,Frank'),('Henry,King III'),('Garcia,Jerome John');

DECLARE @suffix TABLE (suffix VARCHAR(10));
INSERT INTO @suffix (suffix) VALUES
('Jr.'),('III'),('Sr.');
-- DDL and sample data population,end

DECLARE @separator CHAR(1) = ',';

;WITH rs AS
(
   SELECT *,TRY_CAST('<root><x><![CDATA[' + 
            REPLACE(FullName + SPACE(1) COLLATE Czech_BIN2,@separator,']]></x><x><![CDATA[') + 
         ']]></x></root>' AS XML) AS xmldata
   FROM @tbl
),cte AS
(
SELECT rs.*,x.pos,x.size,LEFT(c.value('(x[2]/text())[1]','VARCHAR(30)'),pos - 1) AS FirstName,c.value('(x[1]/text())[1]','VARCHAR(30)') AS LastName,TRIM(RIGHT(c.value('(x[2]/text())[1]',IIF((size - pos) < 0,size-pos+1))) AS MiddleName,TRIM(COALESCE(c.value('(x[3]/text())[1]',RIGHT(c.value('(x[2]/text())[1]',size-pos+1)))) AS Suffix
FROM rs CROSS APPLY xmldata.nodes('/root') AS t(c)
    CROSS APPLY (SELECT CHARINDEX(SPACE(1),c.value('(x[2]/text())[1]','VARCHAR(30)')),LEN(c.value('(x[2]/text())[1]','VARCHAR(30)'))) AS x(pos,size)
)
SELECT ID,FullName,cte.FirstName,IIF(MiddleName IN (SELECT Suffix FROM @suffix,'',cte.MiddleName) AS MiddleName,cte.LastName,IIF(cte.Suffix NOT IN (SELECT Suffix FROM @suffix),cte.Suffix) AS Suffix
FROM cte;

输出

+----+-------------------------+-----------+------------+----------+--------+
| ID |        FullName         | FirstName | MiddleName | LastName | Suffix |
+----+-------------------------+-----------+------------+----------+--------+
|  1 | Johnson,Jr. | John      | Johnny     | Johnson  | Jr.    |
|  2 | Anderson,Sr.  | Andrew    | A          | Anderson | Sr.    |
|  3 | Smith,Smitty Jr.        | Smitty    |            | Smith    | Jr.    |
|  4 | Abegnale,Frank          | Frank     |            | Abegnale |        |
|  5 | Henry,King III          | King      |            | Henry    | III    |
|  6 | Garcia,Jerome John      | Jerome    | John       | Garcia   |        |
+----+-------------------------+-----------+------------+----------+--------+
,

有点脑痛和后来的 Sql 服务器错误,这是您可以尝试的一种可能的解决方案。

这有点冗长,正如建议的那样,确实需要您提前知道可能的后缀是什么。

with 
    num as (select top(30) Row_Number() over(order by (select null)) n from sys.messages),parts as (
        select top(1000) n.Fullname,w.n,/*sql server optimizer issue workaround*/
            IsNull(Iif( CharIndex(',',name1)=0 and CharIndex(' ',name1)=0,name1,null),Iif( CharIndex(',name2)=0 and CharIndex(' ',name2)=0,name2,null)) part
        from Names n
        cross apply (
            select num.n,Substring(n.Fullname,num.n,CharIndex(',n.Fullname + ',num.n) - num.n) name1,CharIndex(' ',n.Fullname + ' ',num.n) - num.n) name2
            from num
            where num.n<DataLength(n.Fullname) and Substring(',' + n.Fullname,1) in (',' ')
        )w
)
select Fullname,Max(FirstName) firstName,Max(Iif(MiddleName=FirstName or MiddleName=Lastname,null,MiddleName))MiddleName,Max(Lastname) LastName,Max(Suffix) Suffix
from (
    select Fullname,case when Lag(n) over(partition by Fullname order by n)=Min(n) over(partition by Fullname) then part end FirstName,case when Lead(n) over(partition by Fullname order by n)=Max(n) over(partition by Fullname) and Suffix is null 
            or n=Max(n) over(partition by Fullname) and Suffix is null then part end MiddleName,case when n=Min(n) over(partition by Fullname) then part end Lastname,case when n=Max(n) over(partition by Fullname) and Suffix is not null then part end Suffix
    from parts p
    left join Suffixes s on s.Suffix=p.part
    where p.part !=''
)x
group by Fullname
order by Fullname

看到这个working fiddle example