为什么 Postgres 在仅索引扫描上花费这么多时间

问题描述

Postgres 版本：12

解释（分析为真，详细为真，成本为真，缓冲区为真，时间为真） SELECT MIN("id"),MAX("id") FROM "public"."hotel_slot_inventory" WHERE ( "updated_at" >= '2021-03-02 13:30:03' AND "updated_at"

查询计划：

 Result  (cost=512.17..512.18 rows=1 width=8) (actual time=65556.920..65556.926 rows=1 loops=1)
   Output: $0,$1
   Buffers: shared hit=370 read=454012 written=8
   I/O Timings: read=62266.717 write=0.194
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.57..256.09 rows=1 width=4) (actual time=65251.998..65252.001 rows=1 loops=1)
           Output: hotel_slot_inventory.id
           Buffers: shared hit=1 read=453546 written=8
           I/O Timings: read=61967.042 write=0.194
           ->  Index Only Scan using hotel_slot_inventory_id_updated_at_idx on public.hotel_slot_inventory  (cost=0.57..3291347.07 rows=12881 width=4) (actual time=65251.996..65251.997 rows=1 loops=1)
                 Output: hotel_slot_inventory.id
                 Index Cond: ((hotel_slot_inventory.id IS NOT NULL) AND (hotel_slot_inventory.updated_at >= '2021-03-02 13:30:03'::timestamp without time zone) AND (hotel_slot_inventory.updated_at < '2021-03-03 06:15:19.127884'::timestamp without time zone))
                 Heap Fetches: 1
                 Buffers: shared hit=1 read=453546 written=8
                 I/O Timings: read=61967.042 write=0.194
   InitPlan 2 (returns $1)
     ->  Limit  (cost=0.57..256.09 rows=1 width=4) (actual time=304.902..304.903 rows=1 loops=1)
           Output: hotel_slot_inventory_1.id
           Buffers: shared hit=369 read=466
           I/O Timings: read=299.674
           ->  Index Only Scan Backward using hotel_slot_inventory_id_updated_at_idx on public.hotel_slot_inventory hotel_slot_inventory_1  (cost=0.57..3291347.07 rows=12881 width=4) (actual time=304.899..304.899 rows=1 loops=1)
                 Output: hotel_slot_inventory_1.id
                 Index Cond: ((hotel_slot_inventory_1.id IS NOT NULL) AND (hotel_slot_inventory_1.updated_at >= '2021-03-02 13:30:03'::timestamp without time zone) AND (hotel_slot_inventory_1.updated_at < '2021-03-03 06:15:19.127884'::timestamp without time zone))
                 Heap Fetches: 3892
                 Buffers: shared hit=369 read=466
                 I/O Timings: read=299.674
 Planning Time: 0.229 ms
 Execution Time: 65556.982 ms
(28 rows)

我们可以看到，这个简单的索引只扫描花费了 65556.982 毫秒。 InitPlan 1 65251.997 ms 占用了大部分时间。为什么会这样？。它只需要分别从 btree 索引向前和向后搜索中获取第一条记录，因为查询要求 Min 和 Max ......不需要从 btree 索引中获取所有匹配的记录
仅供参考：真空没有太大帮助。

编辑索引膨胀详细信息：

真实大小：3751411712 = 3.49 GB

额外大小：470237184 = 448 MB

额外比率：12.53

填充因子：90

膨胀大小：107053056 = 102 MB

膨胀比率：2.85

表格膨胀大小： 膨胀大小：475283456 = 453 MB

膨胀比率：5.088

解决方法

您的索引必须以 (id,updated_at...) 开头。请注意，此索引不能仅读取有问题的时间范围，因为那不是索引中的第一列，并且第一列不是由相等指定的。于是你开始扫描整个索引，直到找到满足时间条件的行。我称之为索引内过滤器。显然，这在向前方向上进行了大量扫描，那是因为索引末尾的记录都不符合时间条件。然而，规划器并不理解这个事实，它认为它会找到均匀分散在整个索引中的 12881 行。并非全部结束。

与常规（索引外）过滤器不同，该计划不会报告评估了多少行然后被索引内过滤器删除。这使得计划有点难以解释。

这种解释有两个证据。一是即使您的查询未指定限制，限制节点也完全存在。那只能支持最小和最大聚合。另一个是在索引条件中注入 IS NOT NULL，而该索引条件不在您的查询中。我不知道为什么会这样，但它确实表示索引内过滤器（或有时是部分索引）而不是普通索引用法。

explain explain postgresql query-optimization

为什么 Postgres 在仅索引扫描上花费这么多时间

问题描述

解决方法

相关问答