PSQL - 按两个单独字段上的间隔过滤的查询性能

问题描述

我有一个包含时间间隔的 Postgresql 表。

这是我的表的简化结构

CREATE TABLE intervals (
  name        varchar(40),time_from   timestamp,time_to     timestamp
);

该表包含数百万条记录，但是，如果您在过去的特定时间点应用过滤器，则该记录的数量

time_from <= [requested time] <= time_to

数量总是非常有限（不超过 3k 个结果）。所以，像这样的查询

SELECT *
FROM intervals
WHERE time_from <= '2020-01-01T10:00:00' and time_to >= '2020-01-01T10:00:00'

应该返回相对较少的结果，理论上，如果我使用正确的索引，它应该会很快。但是一点也不快

我尝试在 time_from 和 time_to 上添加组合索引，但引擎没有选择它。

Seq Scan on intervals  (cost=0.00..156152.46 rows=428312 width=32) (actual time=13.223..3599.840 rows=4981 loops=1)
  Filter: ((time_from <= '2020-01-01T10:00:00') AND (time_to >= '2020-01-01T10:00:00'))
  Rows Removed by Filter: 2089650
    Planning Time: 0.159 ms
    Execution Time: 3600.618 ms

我应该添加什么类型的索引，以加快查询速度？

解决方法

btree 索引在这里效率不高。它可以快速丢弃 time_from > '2020-01-01T10:00:00' 的所有内容，但这可能不是表格的全部内容（至少，如果您的表格可以追溯到很多年）。一旦以这种方式消耗了索引的第一列，就不能非常有效地使用下一列。它只能跳转到 time_from 关系内的 time_to 值的特定部分，这不是很有用，因为可能没有那么多关系。（至少，它不能在计划查询时向自己证明）。

你需要的是一个gist index，专门针对这种多维的东西：

create extension btree_gist ;
create index on intervals using gist (time_from,time_to);

此索引将支持您编写的查询。另一种可能性是索引时间范围并索引它们，而不是单独的开始和结束点。

-- this one does not need btree_gist.
create index on intervals using gist (tsrange(time_from,time_to));

但是这个索引迫使你以不同的方式编写查询：

SELECT * FROM intervals
WHERE tsrange(time_from,time_to) @> '2020-01-01T10:00:00'::timestamp

indexing indexing intervals postgresql