Postgres不使用索引对数据进行排序

问题描述

我有一个表learners，其中有约320万行。该表包含与用户相关的信息，例如姓名和电子邮件。我需要优化一些在某些列上使用order by的查询。因此，为了进行测试，我创建了一个temp_learners表，其中包含80万行。我在此表上创建了两个索引：

CREATE UNIQUE INDEX "temp_learners_companyId_userId_idx"
  ON temp_learners ("companyId" ASC,"userId" ASC,"learnerUserName" ASC,"learnerEmailId" ASC);

和

CREATE INDEX temp_learners_company_name_email_index
  ON temp_learners ("companyId","learnerUserName","learnerEmailId");

第二个索引仅用于测试。现在，当我运行此查询时：

SELECT *
FROM temp_learners
WHERE "companyId" = 909666665757230431 AND "userId" IN (
                                                        4990609084216745771,4990610022492247987,4990609742667096366,4990609476136523663,5451985767018841230,5451985767078553638,5270390122102920730,4763688819142650938,5056979692501246449,5279569274741647114,5031660827132289520,4862889373349389098,5299864070077160421,4740222596778406913,5320170488686569878,5270367618320474818,5320170488587895729,5228888485293847415,4778050469432720821,5270392314970177842,4849087862439244546,5270392117430427860,5270351184072717902,5330263074228870897,4763688829301614114,4763684609695916489,5270390232949727716
  ) ORDER BY "learnerUserName","learnerEmailId";

db使用的查询计划是这样的：

Sort  (cost=138.75..138.76 rows=4 width=1581) (actual time=0.169..0.171 rows=27 loops=1)
"  Sort Key: ""learnerUserName"",""learnerEmailId"""
  Sort Method: quicksort  Memory: 73kB
  ->  Index Scan using "temp_learners_companyId_userId_idx" on temp_learners  (cost=0.55..138.71 rows=4 width=1581) (actual time=0.018..0.112 rows=27 loops=1)
"        Index Cond: ((""companyId"" = '909666665757230431'::bigint) AND (""userId"" = ANY ('{4990609084216745771,5270390232949727716}'::bigint[])))"
Planning time: 0.116 ms
Execution time: 0.191 ms

在此不对索引进行排序。但是当我运行这个查询

SELECT *
FROM temp_learners
WHERE "companyId" = 909666665757230431
   ORDER BY "learnerUserName","learnerEmailId" limit 500;

此查询在排序时使用索引。

Limit  (cost=0.42..1360.05 rows=500 width=1581) (actual time=0.018..0.477 rows=500 loops=1)
  ->  Index Scan using temp_learners_company_name_email_index on temp_learners  (cost=0.42..332639.30 rows=122327 width=1581) (actual time=0.018..0.442 rows=500 loops=1)
        Index Cond: ("companyId" = '909666665757230431'::bigint)
Planning time: 0.093 ms
Execution time: 0.513 ms

我无法理解的是为什么postgre在第一个查询中不使用索引？另外，我想弄清楚该表learner的正常用例是与其他表联接。因此，我编写的第一个查询与joins方程更为相似。例如，

SELECT *
FROM temp_learners AS l
INNER JOIN entity_learners_basic AS elb
ON l."companyId" = elb."companyId" AND l."userId" = elb."userId"
WHERE l."companyId" = 909666665757230431 AND elb."gameId" = 1050403501267716928
ORDER BY "learnerUserName","learnerEmailId" limit 5000;

即使在更正索引之后，查询计划也不会为排序建立索引。

QUERY PLAN
Limit  (cost=3785.11..3785.22 rows=44 width=1767) (actual time=163.554..173.135 rows=5000 loops=1)
  ->  Sort  (cost=3785.11..3785.22 rows=44 width=1767) (actual time=163.553..172.791 rows=5000 loops=1)
"        Sort Key: l.""learnerUserName"",l.""learnerEmailId"""
        Sort Method: external merge  disk: 35416kB
        ->  nested Loop  (cost=1.12..3783.91 rows=44 width=1767) (actual time=0.019..63.743 rows=21195 loops=1)
              ->  Index Scan using primary_index__entity_learners_basic on entity_learners_basic elb  (cost=0.57..1109.79 rows=314 width=186) (actual time=0.010..6.221 rows=21195 loops=1)
                    Index Cond: (("companyId" = '909666665757230431'::bigint) AND ("gameId" = '1050403501267716928'::bigint))
              ->  Index Scan using "temp_learners_companyId_userId_idx" on temp_learners l  (cost=0.55..8.51 rows=1 width=1581) (actual time=0.002..0.002 rows=1 loops=21195)
                    Index Cond: (("companyId" = '909666665757230431'::bigint) AND ("userId" = elb."userId"))
Planning time: 0.309 ms
Execution time: 178.422 ms

Postgres在联接和排序数据时不使用索引吗？

解决方法

PostgreSQL可以在第一个查询中使用("companyId","learnerUserName","learnerEmailId")上的索引，但是附加的IN条件将结果行的数量减少到大约4行，这意味着该排序不会花费任何费用完全没有因此，它选择使用可以支持IN条件的索引。

使用该索引返回的行不会自动按照正确的顺序排列，因为

您为最后一个索引列指定了DESC，但为前一个索引列指定了ASC
您在IN列表中有多个元素。

在没有IN条件的情况下，将返回足够的行，因此PostgreSQL认为按索引排序并过滤掉不符合条件的行会更便宜。

对于第一个查询，不可能有同时支持IN条件中的WHERE列表和ORDER BY子句的索引，因此PostgreSQL必须做出选择。

PostgreSQL选择它认为会更快的计划。使用以正确顺序提供行的索引意味着使用选择性要低得多的索引，因此它认为总体上不会更快。

如果您想让PostgreSQL相信排序是世界上最糟糕的事情，则可以set enable_sort=off。如果它仍然在那之后排序，那么您就会知道PostgreSQL没有避免索引的正确索引，而不是仅仅认为它们实际上并不会更快。

indexing sql-order-by