问题描述
我需要帮助来优化下面的子查询。简而言之,我有以下查询,其中根据子查询条件,tree
表联接了branch
上的s_id
表和timestamp
表的最大branch
。>
我对该查询返回的结果感到满意。但是,此查询非常慢。瓶颈是从属子查询(branch2
),它检查14000多行。如何优化子查询以加快查询速度?
SELECT *
FROM dept.tree tree
LEFT JOIN dept.branch branch ON tree.s_id = branch.s_id
AND branch.timestamp =
(
SELECT MAX(timestamp)
FROM dept.branch branch2
WHERE branch2.s_id = tree.s_id
AND branch2.timestamp <= tree.timestamp
)
WHERE tree.timestamp BETWEEN CONVERT_TZ('2020-05-16 00:00:00','America/Toronto','UTC')
AND CONVERT_TZ('2020-05-16 23:59:59','UTC')
AND tree.s_id IN ('459','460')
ORDER BY tree.timestamp ASC;
表树:
id Box_id timestamp
373001645 1 2020-05-07 06:00:20
373001695 1 2020-05-07 06:02:26
373001762 1 2020-05-07 06:05:17
373001794 1 2020-05-07 06:06:38
373001810 2 2020-05-07 06:07:21
表分支:
id Box_id timestamp data
373001345 1 2020-05-07 06:00:20 {"R": 0.114,"H": 20.808}
373001395 1 2020-05-07 06:02:26 {"R": 0.12,"H": 15.544}
373001462 1 2020-05-07 06:03:01 {"R": 0.006,"H": 55.469}
373001494 1 2020-05-07 06:04:38 {"R": 0.004,"H": 51.85}
373001496 1 2020-05-07 06:05:18 {"R": 0.02,"H": 5.8965}
373001497 1 2020-05-07 06:06:39 {"R": 0.12,"H": 54.32}
373001510 2 2020-05-07 06:07:09 {"R": 0.34,"H": 1.32}
373001511 2 2020-05-07 06:07:29 {"R": 0.56,"H": 32.7}
分支具有s_id和时间戳索引
我正在使用5.7.25-google-log版本
EXPLAIN提供以下内容:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY tree range unique_timestamp_s_id,idx_s_id_timestamp,idx_timestamp idx_s_id_timestamp 10 2629 100.00 Using index condition; Using filesort
1 PRIMARY branch ref unique_timestamp_s_id,idx_timestamp unique_timestamp_s_id 5 func 1 100.00 Using where
2 DEPENDENT SUBQUERY branch2 ref unique_timestamp_s_id,idx_timestamp idx_s_id_timestamp 5 tree.s_id 14122 33.33 Using where; Using index
解决方法
请提供SHOW CREATE TABLE
。
branch
需要INDEX(s_id,timestamp)
您需要LEFT
吗?可能无缘无故拖慢了查询速度。
一列上的IN
和另一上的BETWEEN
的组合可能优化不佳;您正在运行什么版本?
请提供EXPLAIN SELECT
,以便我们讨论其是否经过优化。如果不是,我们可以讨论如何将IN
(OR
的变体)转换为UNION
。
这实际上可能比我上面想的要快...
具有上面的索引,然后大幅重写查询:
SELECT b.*
FROM ( SELECT s_id,MAX(timestamp) as timestamp
FROM dept.branch
WHERE timestamp BETWEEN
CONVERT_TZ('2020-05-16 00:00:00','America/Toronto','UTC')
AND CONVERT_TZ('2020-05-16 23:59:59','UTC')
AND s_id IN ('459','460')
) AS x
JOIN dept.branch AS b USING(s_id,timestamp)
首先,查看是否获得正确的信息。然后,我将解释如何在子查询中执行UNION
(如果需要帮助)。
这应该更快:
select
tree.s_id,tree.timestamp,branch.data
from
(
SELECT
tree.s_id,max(branch.timestamp) as max_branch_timestamp
FROM
dept.tree tree
LEFT JOIN dept.branch branch
ON(
branch.s_id = tree.s_id
and branch.timestamp <= tree.timestamp
)
WHERE
tree.timestamp BETWEEN
CONVERT_TZ('2020-05-16 00:00:00','UTC') AND
CONVERT_TZ('2020-05-16 23:59:59','UTC')
AND tree.s_id IN ('459','460')
group by tree.s_id,tree.timestamp
) tree
left outer join branch
on(
branch.s_id = tree.s_id
and branch.timestamp = tree.max_branch_timestamp
)