Sql查询LEFT JOIN/NULL加入逻辑

问题描述

我想了解以下查询背后的连接逻辑？？下面是正在使用的表格

on t1.log_id-1 = t2.log_id
    where t2.log_id is null

完整查询：-

select start_id,min(end_id) as end_id
from (
    select t1.log_id as start_id
    from logs as t1
    left join logs as t2
        on t1.log_id-1 = t2.log_id
    where t2.log_id is null
) tt_start
join (
    select t1.log_id as end_id
    from logs as t1
    left join logs as t2
        on t1.log_id+1 = t2.log_id
    where t2.log_id is null
) tt_end
where start_id<=end_id
group by start_id

表格：-

Log_id
1
2
3
7
8
10

解决方法

这是一种 not exists 逻辑。仅当过滤列不能为空且存在匹配行时才有效。

直接使用not exists会好很多，因为优化器可以更好地理解它并将其直接转化为反连接。例如：

where not exists (select 1
    from logs as t2
    where t1.log_id-1 = t2.log_id)

left join 结构经常被不太了解的人使用，因为在大多数优化器实现中，这个结构不是很好理解。

例如，在 SQL Server 中，保证只有一行的查询计划子树对于某些优化非常有用。由于 left join 理论上可以将行加倍，因此不存在这种保证。即使你我都知道这是不可能的，但优化器中没有任何逻辑。

where 不是连接逻辑的一部分，它是一个过滤器，仅在连接逻辑之后应用。

在我看来，ON t1.log_id-1 = t2.log_id 和 WHERE t2.log_id IS NULL 的组合应该给你零行。如果 t2.log_id 的值为空，则它也不能比 t1.log_id 小 1。

这是自连接和反连接的组合。

自联接：表与自身联接（此处是 ID 递减或递增 1 的行）。
反联接：左外联接和 WHERE 子句仅保留外联接的行，从而保留左表中没有匹配的所有行。这是在年轻的 DBMS 上使用的一种相当普遍的技术，其中联接已经相当优化，而更直接的方法 NOT EXISTS 和 NOT IN 没有。

这个查询的作用是：

查找没有直接前身的 ID。例如。对于 ID 1、2、4、5、6、8、10、12、23、24，我们会找到 1、4、8、10、12 和 23。
查找没有直接关注者的 ID。例如。对于 ID 1、2、4、5、6、8、10、12、23、24，我们会找到 2、6、8、10、12 和 24。
加入前者与后者，其中前者
获取每个开始 ID 的最小结束 ID：1-2、4-6、8-8、10-10、12-12、23-24。

查询因此找到了数字范围。 1,2,4,5,6,8,10,12,23,24 = 1-2,4-6,23-24.

这种任务称为间隙和孤岛问题。大多数情况下，这些都是用窗口函数解决的：

select min(log(id),max(log_id)
from
(
  select
    log_id,log_id - row_number() over (order by log_id) as grp
  from logs
) grouped
group by grp
order by grp;

演示：https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=3eaeb881c8e5498a02fa0ff34f4cffc3

not-exists sql sql