sql-server – 将数据集与EXCEPT相结合,而不是在LEFT JOIN中检查IS NULL

我目前正在通过Microsoft sql Server 2008 – 数据库开发(MCTS考试70-433)认证.在前面有关组合数据集的章节中,我遇到了EXCEPT(和INTERSECT)命令.一个示例显示如何使用EXCEPT从一个表中获取第二个表中没有相关值的所有值,如下所示:
SELECT EmployeeKey FROM DimEmployee
EXCEPT
SELECT EmployeeKey FROM FactResellerSales

EXCEPT命令对我来说是新的,但是根据我今天所知的情况,我仍然可以使用LEFT JOIN轻松解决问题,并通过以下方式检查连接约束上的IS NULL:

SELECT disTINCT DimEmployee.EmployeeKey FROM DimEmployee
LEFT JOIN FactResellerSales ON FactResellerSales.EmployeeKey = DimEmployee.EmployeeKey
WHERE FactResellerSales.EmployeeKey IS NULL

现在我开始怀疑其中哪一个具有最佳性能.我试着研究查询执行计划,但我不太擅长阅读这些,所以它并没有让我更聪明.对于使用EXCEPT的查询,计划如下所示:

|--nested Loops(Left Anti Semi Join,OUTER REFERENCES:([AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey],[Expr1006]) WITH UnorDERED PREFETCH)
    |--Index Scan(OBJECT:([AdventureWorksDW2008].[dbo].[DimEmployee].[IX_DimEmployee_SalesTerritoryKey]))
    |--Top(TOP EXPRESSION:((1)))
        |--Index Seek(OBJECT:([AdventureWorksDW2008].[dbo].[FactResellerSales].[IX_FactResellerSales_EmployeeKey]),SEEK:([AdventureWorksDW2008].[dbo].[FactResellerSales].[EmployeeKey]=[AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey]) ORDERED FORWARD)

对于使用LEFT JOIN的人,它看起来像这样:

|--Stream Aggregate(GROUP BY:([AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey]))
    |--Filter(WHERE:([AdventureWorksDW2008].[dbo].[FactResellerSales].[EmployeeKey] IS NULL))
        |--Merge Join(Left Outer Join,MERGE:([AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey])=([AdventureWorksDW2008].[dbo].[FactResellerSales].[EmployeeKey]),RESIDUAL:([AdventureWorksDW2008].[dbo].[FactResellerSales].[EmployeeKey]=[AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey]))
            |--Clustered Index Scan(OBJECT:([AdventureWorksDW2008].[dbo].[DimEmployee].[PK_DimEmployee_EmployeeKey]),ORDERED FORWARD)
            |--Index Scan(OBJECT:([AdventureWorksDW2008].[dbo].[FactResellerSales].[IX_FactResellerSales_EmployeeKey]),ORDERED FORWARD)

查询中使用的表来自AdventureWorksDW2008示例数据库,因此下面我还包括两个表的创建脚本,以备正确回答问题所需的情况:

USE [AdventureWorksDW2008]
GO

/****** Object:  Table [dbo].[DimEmployee]    Script Date: 11/22/2010 20:30:20 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[DimEmployee](
    [EmployeeKey] [int] IDENTITY(1,1) NOT NULL,[ParentEmployeeKey] [int] NULL,[EmployeeNationalIDAlternateKey] [nvarchar](15) NULL,[ParentEmployeeNationalIDAlternateKey] [nvarchar](15) NULL,[SalesTerritoryKey] [int] NULL,[FirstName] [nvarchar](50) NOT NULL,[LastName] [nvarchar](50) NOT NULL,[MiddleName] [nvarchar](50) NULL,[NameStyle] [bit] NOT NULL,[Title] [nvarchar](50) NULL,[HireDate] [date] NULL,[BirthDate] [date] NULL,[LoginID] [nvarchar](256) NULL,[EmailAddress] [nvarchar](50) NULL,[Phone] [nvarchar](25) NULL,[MaritalStatus] [nchar](1) NULL,[EmergencyContactName] [nvarchar](50) NULL,[EmergencyContactPhone] [nvarchar](25) NULL,[SalariedFlag] [bit] NULL,[Gender] [nchar](1) NULL,[PayFrequency] [tinyint] NULL,[BaseRate] [money] NULL,[VacationHours] [smallint] NULL,[SickLeaveHours] [smallint] NULL,[CurrentFlag] [bit] NOT NULL,[SalesPersonFlag] [bit] NOT NULL,[DepartmentName] [nvarchar](50) NULL,[StartDate] [date] NULL,[EndDate] [date] NULL,[Status] [nvarchar](50) NULL,CONSTRAINT [PK_DimEmployee_EmployeeKey] PRIMARY KEY CLUSTERED 
(
    [EmployeeKey] ASC
)WITH (PAD_INDEX  = OFF,STATISTICS_norECOmpuTE  = OFF,IGnorE_DUP_KEY = OFF,ALLOW_ROW_LOCKS  = ON,ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[DimEmployee]  WITH CHECK ADD  CONSTRAINT [FK_DimEmployee_DimEmployee] FOREIGN KEY([ParentEmployeeKey])
REFERENCES [dbo].[DimEmployee] ([EmployeeKey])
GO

ALTER TABLE [dbo].[DimEmployee] CHECK CONSTRAINT [FK_DimEmployee_DimEmployee]
GO

ALTER TABLE [dbo].[DimEmployee]  WITH CHECK ADD  CONSTRAINT [FK_DimEmployee_DimsalesTerritory] FOREIGN KEY([SalesTerritoryKey])
REFERENCES [dbo].[DimsalesTerritory] ([SalesTerritoryKey])
GO

ALTER TABLE [dbo].[DimEmployee] CHECK CONSTRAINT [FK_DimEmployee_DimsalesTerritory]
GO

第二个表:

USE [AdventureWorksDW2008]
GO

/****** Object:  Table [dbo].[FactResellerSales]    Script Date: 11/22/2010 20:30:47 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[FactResellerSales](
    [ProductKey] [int] NOT NULL,[OrderDateKey] [int] NOT NULL,[DueDateKey] [int] NOT NULL,[ShipDateKey] [int] NOT NULL,[ResellerKey] [int] NOT NULL,[EmployeeKey] [int] NOT NULL,[PromotionKey] [int] NOT NULL,[CurrencyKey] [int] NOT NULL,[SalesTerritoryKey] [int] NOT NULL,[SalesOrderNumber] [nvarchar](20) NOT NULL,[SalesOrderLineNumber] [tinyint] NOT NULL,[RevisionNumber] [tinyint] NULL,[OrderQuantity] [smallint] NULL,[UnitPrice] [money] NULL,[Extendedamount] [money] NULL,[UnitPricediscountPct] [float] NULL,[discountAmount] [float] NULL,[ProductStandardcost] [money] NULL,[TotalProductCost] [money] NULL,[SalesAmount] [money] NULL,[TaxAmt] [money] NULL,[Freight] [money] NULL,[CarrierTrackingNumber] [nvarchar](25) NULL,[CustomerPONumber] [nvarchar](25) NULL,CONSTRAINT [PK_FactResellerSales_SalesOrderNumber_SalesOrderLineNumber] PRIMARY KEY CLUSTERED 
(
    [SalesOrderNumber] ASC,[SalesOrderLineNumber] ASC
)WITH (PAD_INDEX  = OFF,ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimCurrency] FOREIGN KEY([CurrencyKey])
REFERENCES [dbo].[DimCurrency] ([CurrencyKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimCurrency]
GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimDate] FOREIGN KEY([OrderDateKey])
REFERENCES [dbo].[DimDate] ([DateKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimDate]
GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimDate1] FOREIGN KEY([DueDateKey])
REFERENCES [dbo].[DimDate] ([DateKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimDate1]
GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimDate2] FOREIGN KEY([ShipDateKey])
REFERENCES [dbo].[DimDate] ([DateKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimDate2]
GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimEmployee] FOREIGN KEY([EmployeeKey])
REFERENCES [dbo].[DimEmployee] ([EmployeeKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimEmployee]
GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimProduct] FOREIGN KEY([ProductKey])
REFERENCES [dbo].[DimProduct] ([ProductKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimProduct]
GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimPromotion] FOREIGN KEY([PromotionKey])
REFERENCES [dbo].[DimPromotion] ([PromotionKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimPromotion]
GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimReseller] FOREIGN KEY([ResellerKey])
REFERENCES [dbo].[DimReseller] ([ResellerKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimReseller]
GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimsalesTerritory] FOREIGN KEY([SalesTerritoryKey])
REFERENCES [dbo].[DimsalesTerritory] ([SalesTerritoryKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimsalesTerritory]
GO

这个问题使用了一个非常具体的例子,但我也对一般信息感兴趣,这些信息关于何时/是否适合使用EXCEPT而不是LEFT JOIN并检查IS NULL.

我还注意到,如果我尝试使用INTERCEPT运行第一个查询,第二个查询中的等价物将是使用标准JOIN和SELECT disTINCT DimEmployee.EmployeeKey(根本没有WHERE子句).但是,在这种情况下,执行计划在两种情况下完全相同.

更新
对第二个查询的次要更新(请参阅修订历史记录),这导致对该查询查询计划稍微复杂一些.我猜想更大的查询计划表明查询不太理想,但我仍然希望这个问题得到解答.

解决方法

在LEFT JOIN给出多行的情况下,2个查询将有所不同.也就是说,FactResellerSales是DimEmployee的子代,DimEmployee中每行有很多行.所以你需要为你的JOIN例子注意到disTINCT.

如果您将查询更改为使用NOT EXISTS,则会获得相同的计划(对于NOT EXISTS,左反半连接是典型的)

SELECT EmployeeKey
FROM DimEmployee DE 
WHERE
NOT EXISTS (SELECT * FROM 
        FactResellerSales FRS
    WHERE FRS.EmployeeKey = DE.EmployeeKey)

此外,出于同样的原因,INTERSECT / EXISTS很可能会给出相同的计划.

这是JOIN / EXISTS / IN或OUTER JOIN / NOT EXISTS / NOT IN辩论的另一个方面. INTERSECT / EXCEPT是一个稍微更优雅的构造(NOT)EXISTS你喜欢

编辑:

没有一个明显的问题……

就个人而言,我不使用OUTER JOIN进行“存在”检查:我使用EXISTS或NOT EXISTS(如果我记得的话,使用INTERSECT / EXCEPT),因为你要做的事情更明显. AKA,如果我不需要“外部”表中的行,我不会使用它来避免使用disTINCT.

如果在这种情况下,没有CASE使用OUTER JOIN / IS NULL IMHO.当然,我在需要时使用OUTER JOIN:这个答案仅针对一个特定情况.

相关文章

SELECT a.*,b.dp_name,c.pa_name,fm_name=(CASE WHEN a.fm_n...
if not exists(select name from syscolumns where name=&am...
select a.*,pano=a.pa_no,b.pa_name,f.dp_name,e.fw_state_n...
要在 SQL Server 2019 中设置定时自动重启,可以使用 Window...
您收到的错误消息表明数据库 'EastRiver' 的...
首先我需要查询出需要使用SQL Server Profiler跟踪的数据库标...