高效的 SQL 查询以查找连续数字数据中的差距 (MySQL)

问题描述

我有一个包含“时间”列(INT 无符号)的表格,每一行代表一秒,我需要找到时间间隔(缺少秒)。
我已经尝试过这个查询(在差距之前找到第一次):

SELECT t1.time
FROM `table` AS t1
LEFT JOIN `table` AS t2 ON t2.time=(t1.time+1)
WHERE t2.time IS NULL
ORDER BY TIME ASC
LIMIT 1

它可以工作,但对于大表(接近 100M 行)来说太慢了
有没有更快的解决方案?

解释查询

enter image description here

显示创建:

CREATE TABLE `candles` (
  `time` int(10) unsigned NOT NULL,`open` float unsigned NOT NULL,`high` float unsigned NOT NULL,`low` float unsigned NOT NULL,`close` float unsigned NOT NULL,`vb` int(10) unsigned NOT NULL,`vs` int(10) unsigned NOT NULL,`Trades` int(10) unsigned NOT NULL,PRIMARY KEY (`time`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8

解决方法

在 MySQL 5.7 中,这是一个用户变量可能有用的用例:

select max(time)
from (
    select t.time,@rn := @rn + 1 as rn 
    from (select time from mytable order by time) t
    cross join (select @rn := 0) r
) t
group by time - rn

这将问题作为一个间隙和岛屿问题来解决。这个想法是识别时间增加而没有间隙的记录组(岛屿)。为此,我们为每一行分配一个递增的 id,按时间排序;每当 time 和自动增量之间的差异发生变化时,您就知道存在差距。

,

如果数据库版本是8.0,则可以使用递归公用表表达式,例如

WITH RECURSIVE cte AS 
(
  SELECT 1 AS n
  UNION ALL
  SELECT n + 1 AS value
    FROM cte
   WHERE cte.n < (SELECT MAX(time) FROM tab )
)
SELECT n AS gaps
  FROM cte
  LEFT JOIN tab
    ON n=time
 WHERE cte.n > (SELECT MIN(time) FROM tab ) 
   AND time IS NULL

App.js

,

在 mysql 8 中,你可以使用 LEAD():

select time from (
    select time,lead(time,1) over (order by time) next_time
    from `table`
) t
where time+1 != next_time

在早期版本中,我可能会这样做:

select prev_time as time from (
    select @prev_time+0 as prev_time,if(@prev_time:=time,time,time) as time
    from (select @prev_time:=null) initvars
    cross join (select time from `table` order by time) t
) t
where time != prev_time+1

两者都不包括您的原始查询所在的最长时间。

我认为需要将其视为严格的间隙和孤岛问题的小组对于这么多记录来说成本太高了。

fiddle