问题描述
全部,我是HIVE和常规查询优化的新手。
我有3个与或多或少完全相同的查询的并集。这些联合存在的唯一原因是因为我的源表没有周末或假日日期,并且我需要保留源表中存在的前一个日历日的一些基本值,而不是假期/周末日期存在。 Dateadd函数实际上是3个联合(1、2或3天)的唯一区别
有没有办法将这三个查询合并为一个查询,或者只是以一种更高效的方式做到这一点?
我有点卡住了,但是我已经把这个过程从45分钟的整个过程降低到了4 1/2分钟。只是不确定如何优化这些联合。请帮助:/
UNION ALL
--ADDING 1 DAYS TO FRIDAYS--
select * from
(
SELECT a.portfolio_name,cast(date_add(performance_end_date,1) as timestamp) as performance_end_date,cast(0.0000000 as string) as car_return,a.nav,a.nav_id,row_number() over (partition by a.portfolio_code,a.performance_end_date order by a.nav_id desc) as row_no
FROM carsales a
where
a.portfolio_code IN ('1994',1998,2523)
and a.year=2020 and a.month=09
and DAYOFWEEK(performance_end_date) = 6
) a
where row_no= 1
UNION ALL
--ADDING 2 DAYS TO FRIDAYS--
select * from
(
SELECT a.portfolio_name,2) as timestamp) as performance_end_date,2523)
and a.year=2020 and a.month=09
and DAYOFWEEK(performance_end_date) = 6
) a
where row_no= 1
UNION ALL
--ADDING 3 DAYS To Holidays
select * from
(
SELECT a.portfolio_name,3) as timestamp) as performance_end_date,2523)
and a.year=2020 and a.month=09
and performance_end_date in ('2020-09-04 00:00:00.000','2020-10-09 00:00:00.000')
) a
where row_no= 1
解决方法
如果完全像您写的那样,唯一的区别是date_add参数函数,则可以从其中一个联合中获取sql,并以1,2和3个常量之间的联合交叉连接它。也许交叉连接会比联盟更好。也取决于来源的数字。另外,您可以在进行交叉联接之前过滤行号,以联接更少的行。在下面发布的示例中,我没有过滤行号。
查询将如下所示:
Date(UTC) Type Amount
2020-10-09 04:00:37 SELL 2045.0
2020-10-09 03:04:29 SELL 2045.0
2020-10-09 02:37:43 SELL 2045.0
2020-10-09 01:35:17 SELL 2045.0
编辑1:关于date1或date2的注释,您可以像编写时那样巧妙地进行。在where子句中,将date_column =某物或date_column =某物。
SELECT a.portfolio_name,Cast(Date_add(a.performance_end_date,crs.crs) AS TIMESTAMP) AS
performance_end_date,a.car_return,a.nav,a.nav_id,a.performance_end_date,a.row_no
FROM (SELECT a.portfolio_name,-- Cast(Date_add(performance_end_date,1) AS TIMESTAMP) AS performance_end_date,Cast(0.0000000 AS STRING) AS car_return,Row_number()
OVER (
partition BY a.portfolio_code,a.performance_end_date
ORDER BY a.nav_id DESC) AS row_no
FROM carsales a
WHERE a.portfolio_code IN ( '1994',1998,2523 )
AND a.year = 2020
AND a.month = 09
AND Dayofweek(performance_end_date) = 6) a
CROSS JOIN (SELECT 1 crs
UNION ALL
SELECT 2
UNION ALL
SELECT 3) crs
,
除了@ F.Lazarescu答案,您还可以重写CROSS JOIN子查询。
代替此:
CROSS JOIN (SELECT 1 crs
UNION ALL
SELECT 2
UNION ALL
SELECT 3) crs
使用stack()
UDTF,它将执行得更快:
CROSS JOIN (SELECT stack(3,1,2,3) as crs) crs