ROW_NUMBER 的表现

问题描述

我有一个使用 ROW_NUMBER() 的查询我有这样的事情:

ROW_NUMBER() OVER (ORDER BY publish_date DESC) rnum

查询运行得非常快。但是,如果我添加对“rnum”列的任何引用,则查询会变慢为爬行。因此,似乎只有 ROW_NUMBER() 不是问题,但是当我在实际查询中使用“rnum”时,它会爬行大约 30 秒。

有什么想法吗?

作为参考,这里是查询

  WITH aquire AS (
    SELECT rtnum,trans_id,source,provider,publish_date,story_link,industry_name,sector_name,subject,teaser,tickers
    FROM (SELECT d.trans_id,d.source,'AquireMedia' AS provider,d.trans_time AS publish_date,'/research/get_news.PHP?id=' || d.trans_id AS story_link,i.name AS industry_name,s.sector_name,d.headline AS subject,NULL AS teaser,NEWS.NEWS_FUNCTIONS.CONCATENATE_TICKERS(d.trans_id,'AQUIREMEDIA') AS tickers,ROW_NUMBER() OVER (PARTITION BY d.trans_id ORDER BY d.trans_time DESC) as rtnum
          FROM   story_descriptions_3m d,story_tickers_3m t,uber_master_mv m,industry i,ind_sector ix,sectors s,comp_ind c
          WHERE  d.trans_id = t.trans_id
            AND  t.m_ticker = m.m_ticker
            AND  t.m_ticker = c.m_ticker(+)
            AND  c.ind_code = i.ind_code(+)
            AND  i.ind_code = ix.ind_code(+)
            AND  ix.sector_id = s.sector_id(+)  AND s.sector_id = 10 )
    WHERE rtnum = 1),partner AS (
  SELECT rtnum,tickers
  FROM (SELECT CAST(n.story_id AS VARCHAR2(20)) trans_id,n.provider AS source,'Partner News' AS provider,n.story_date AS publish_date,n.link AS story_link,n.title AS subject,CAST(substr(n.teaser,1,4000) AS VARCHAR2(4000)) AS teaser,NEWS.NEWS_FUNCTIONS.CONCATENATE_TICKERS(n.story_id,'OTHER') AS tickers,ROW_NUMBER() OVER (PARTITION BY n.story_id ORDER BY n.story_date DESC) as rtnum
        FROM   news_stories_3m n,news_stories_lookup_3m t,comp_ind c,sectors s
        WHERE  t.story_id = n.story_id
          AND  t.ticker   = m.ticker
          AND  m.m_ticker = c.m_ticker(+)
          AND  c.ind_code = i.ind_code(+)
          AND  i.ind_code = ix.ind_code(+)
          AND  ix.sector_id = s.sector_id(+)  AND s.sector_id = 10 )
   WHERE rtnum = 1)
  SELECT  trans_id,TO_CHAR(publish_date,'MM/DD/YYYY HH24:MI:SS') AS publish_date,UNIX_TIMESTAMP(publish_date) AS timestamp,tickers
  FROM (SELECT trans_id,tickers,ROW_NUMBER() OVER (ORDER BY publish_date DESC) rnum
             FROM (SELECT trans_id,tickers
                        FROM   aquire WHERE rtnum <= 5
                        UNION ALL
                        SELECT trans_id,tickers
                        FROM   partner WHERE rtnum <= 5)) 
WHERE rnum BETWEEN 1 AND 1 * 5;

解决方法

让我们在一个简单的示例上模拟您的查询,以演示和解释您遇到可预期的结果。

示例数据

create table tab1 as
select rownum id,lpad('x',3000,'y') pad from dual connect by level <= 1000000;

现在,如果您在 IDE 中运行下面的查询,您将立即看到结果集的第一页。

注意,您定义了 row_number不要使用它。

select id,pad from (
 select id,pad,row_number() over (order by id) as rnum
 from tab1
)

答案在下面的执行计划中

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  1000K|  2866M|   135K  (1)| 00:00:06 |
|   1 |  TABLE ACCESS FULL| TAB1 |  1000K|  2866M|   135K  (1)| 00:00:06 |
--------------------------------------------------------------------------

您看到没有执行排序和过滤,row_number 很简单被忽略

这(只获取少数初始行且不进行排序)解释了为什么查询执行

相反,如果您对 row_number 进行如下约束

SQL> select id,pad from (
  2   select id,3    row_number() over (order by id) as rnum
  4   from tab1
  5  ) where rnum between 1 and 5
  6  ;

Elapsed: 00:00:07.80

您观察到了可观的经过时间。 执行计划再次提供了答案。

请参阅 here 如何为您的查询获取 execution plan

-----------------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |     5 |  7640 |       |   762K  (1)| 00:00:30 |
|*  1 |  VIEW                    |      |     5 |  7640 |       |   762K  (1)| 00:00:30 |
|*  2 |   WINDOW SORT PUSHED RANK|      |  1000K|  2866M|  3906M|   762K  (1)| 00:00:30 |
|   3 |    TABLE ACCESS FULL     | TAB1 |  1000K|  2866M|       |   135K  (1)| 00:00:06 |
-----------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("RNUM">=1 AND "RNUM"<=5)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "ID")<=5)

结果是,现在您必须遍历所有记录(在您的情况下执行所有连接),这会破坏性能。

要证明这一点,请使用 fetch all 选项或添加的 order by 子句运行简单的 performat 查询。您很可能会得到与第二个查询相同的性能不佳结果。

最后评论

您可以使用 row_limiting_clause 代替 ROW_NUMBER()

row_number 子句中的 order by 传递排序列,并使用 offsetfetch first 来限制结果。

select id,pad
 from tab1
) order by id
fetch first 5 rows only;

在封面下,您将看到使用上述 WINDOW SORT PUSHED RANK 的相同执行计划。