Hive Query,有什么好的方法可以优化这些联合吗?

问题描述

全部,我是HIVE和常规查询优化的新手。

我有3个与或多或少完全相同的查询的并集。这些联合存在的唯一原因是因为我的源表没有周末或假日日期,并且我需要保留源表中存在的前一个日历日的一些基本值,而不是假期/周末日期存在。 Dateadd函数实际上是3个联合(1、2或3天)的唯一区别

有没有办法将这三个查询合并为一个查询,或者只是以一种更高效的方式做到这一点?

我有点卡住了,但是我已经把这个过程从45分钟的整个过程降低到了4 1/2分钟。只是不确定如何优化这些联合。请帮助:/

   UNION ALL 

--ADDING 1 DAYS TO FRIDAYS--
select * from
(
SELECT a.portfolio_name,cast(date_add(performance_end_date,1) as timestamp) as performance_end_date,cast(0.0000000 as string) as car_return,a.nav,a.nav_id,row_number() over (partition by a.portfolio_code,a.performance_end_date order by a.nav_id desc) as row_no
FROM carsales a
where

a.portfolio_code IN ('1994',1998,2523)
and  a.year=2020 and a.month=09
and DAYOFWEEK(performance_end_date) = 6
) a
where row_no= 1

UNION ALL 

--ADDING 2 DAYS TO FRIDAYS--
select * from
(
SELECT a.portfolio_name,2) as timestamp) as performance_end_date,2523)
and  a.year=2020 and a.month=09
and DAYOFWEEK(performance_end_date) = 6
) a
where row_no= 1

UNION ALL 

--ADDING 3 DAYS To Holidays
select * from
(
SELECT a.portfolio_name,3) as timestamp) as performance_end_date,2523)
and  a.year=2020 and a.month=09
and performance_end_date in ('2020-09-04 00:00:00.000','2020-10-09 00:00:00.000')
) a
where row_no= 1

解决方法

如果完全像您写的那样,唯一的区别是date_add参数函数,则可以从其中一个联合中获取sql,并以1,2和3个常量之间的联合交叉连接它。也许交叉连接会比联盟更好。也取决于来源的数字。另外,您可以在进行交叉联接之​​前过滤行号,以联接更少的行。在下面发布的示例中,我没有过滤行号。

查询将如下所示:

Date(UTC)            Type  Amount   
2020-10-09 04:00:37  SELL  2045.0 
2020-10-09 03:04:29  SELL  2045.0 
2020-10-09 02:37:43  SELL  2045.0
2020-10-09 01:35:17  SELL  2045.0

编辑1:关于date1或date2的注释,您可以像编写时那样巧妙地进行。在where子句中,将date_column =某物或date_column =某物。

SELECT a.portfolio_name,Cast(Date_add(a.performance_end_date,crs.crs) AS TIMESTAMP) AS 
       performance_end_date,a.car_return,a.nav,a.nav_id,a.performance_end_date,a.row_no 
FROM   (SELECT a.portfolio_name,-- Cast(Date_add(performance_end_date,1) AS TIMESTAMP) AS performance_end_date,Cast(0.0000000 AS STRING)   AS car_return,Row_number() 
                 OVER ( 
                   partition BY a.portfolio_code,a.performance_end_date 
                   ORDER BY a.nav_id DESC) AS row_no 
        FROM   carsales a 
        WHERE  a.portfolio_code IN ( '1994',1998,2523 ) 
               AND a.year = 2020 
               AND a.month = 09 
               AND Dayofweek(performance_end_date) = 6) a 
       CROSS JOIN (SELECT 1 crs 
                   UNION ALL 
                   SELECT 2 
                   UNION ALL 
                   SELECT 3) crs 
,

除了@ F.Lazarescu答案,您还可以重写CROSS JOIN子查询。

代替此:

CROSS JOIN (SELECT 1 crs 
                   UNION ALL 
                   SELECT 2 
                   UNION ALL 
                   SELECT 3) crs 

使用stack() UDTF,它将执行得更快:

CROSS JOIN (SELECT stack(3,1,2,3) as crs) crs