Impala Last_Value() 未按预期给出结果

问题描述

我在 Impala 中有一个表,其中有 Unix-Time 形式的时间信息(频率为 1 毫秒)和有关三个变量的信息,如下所示:

O(N^4 log(N))

我想重新采样数据并获取新时间窗口的最后一个值。例如,如果我想重新采样为 10 秒频率,则输出应该是 10 秒窗口的最后一个值,如下所示:

ts          Val1        Val2        Val3        
1.60669E+12 7541.76     0.55964607  267.1613        
1.60669E+12 7543.04     0.5607262   267.27805       
1.60669E+12 7543.04     0.5607241   267.22308       
1.60669E+12 7543.6797   0.56109643  267.25974       
1.60669E+12 7543.6797   0.56107396  267.30624       
1.60669E+12 7543.6797   0.56170875  267.2643    

为了得到这个结果,我正在运行以下查询:

ts                      val1_Last       Val2_Last       Val3_Last   
2020-11-29 22:30:00     7541.76         0.55964607      267.1613
2020-11-29 22:30:10     7542.3994       0.5613486       267.31238
2020-11-29 22:30:20     7542.3994       0.5601791       267.22842
2020-11-29 22:30:30     7544.32         0.56069416      267.20248

我在一些论坛上读到 select distinct * from ( select ts,last_value(Val1) over (partition by ts order by ts rows between unbounded preceding and unbounded following) as Val1,last_value(Val2) over (partition by ts order by ts rows between unbounded preceding and unbounded following) as Val2,last_value(Val3) over (partition by ts order by ts rows between unbounded preceding and unbounded following) as Val3 from (SELECT cast(cast(unix_timestamp(cast(ts/1000 as TIMESTAMP))/10 as bigint)*10 as TIMESTAMP) as ts,Val1 as Val1,Val2 as Val2,Val3 as Val3 FROM Sensor_Data.Table where unit='Unit1' and cast(ts/1000 as TIMESTAMP) BETWEEN '2020-11-29 22:30:00' and '2020-12-01 01:51:00') as ttt) as tttt order by ts 有时会导致问题,所以我尝试使用 LAST_VALUE()FIRST_VALUE 来达到同样的目的。查询如下:

ORDER BY DESC

但在这两种情况下,我都没有得到预期的结果。重采样时间 select distinct * from ( select ts,first_value(Val1) over (partition by ts order by ts desc rows between unbounded preceding and unbounded following) as Val1,first_value(Val2) over (partition by ts order by ts desc rows between unbounded preceding and unbounded following) as Val2,first_value(Val3) over (partition by ts order by ts desc rows between unbounded preceding and unbounded following) as Val3 from (SELECT cast(cast(unix_timestamp(cast(ts/1000 as TIMESTAMP))/10 as bigint)*10 as TIMESTAMP) as ts,val2 as Val2,Val3 as Val3 FROM product_sofcdtw_ops.as_operated_full_backup where unit='FCS05-09' and cast(ts/1000 as TIMESTAMP) BETWEEN '2020-11-29 22:30:00' and '2020-12-01 01:51:00') as ttt) as tttt order by ts 按预期出现(窗口为 10 秒),但我在 0-9 秒、10-19 秒之间获得了 tsVal1Val2 的随机值,... 窗户。

这个查询在逻辑上看起来不错,我没有发现任何问题。任何人都可以解释为什么我没有使用这个查询得到正确的答案。

谢谢!!!

解决方法

问题是这一行:

last_value(Val1) over (partition by ts order by ts rows between unbounded preceding and unbounded following) as Val1,

您正在按同一列 ts 进行分区和排序——因此没有排序(或者更具体地说,按在整个分区中保持不变的值排序会导致任意排序)。您需要保留 原始 ts 才能完成这项工作,并将其用于订购:

select ts,last_value(Val1) over (partition by ts_10 order by ts rows between unbounded preceding and unbounded following) as Val1,last_value(Val2) over (partition by ts_10 order by ts rows between unbounded preceding and unbounded following) as Val2,last_value(Val3) over (partition by ts_10 order by ts rows between unbounded preceding and unbounded following) as Val3 
from (SELECT cast(cast(unix_timestamp(cast(ts/1000 as TIMESTAMP))/10 as bigint)*10 as TIMESTAMP) as ts_10,t.*
      FROM Sensor_Data.Table t
      WHERE unit = 'Unit1' AND
            cast(ts/1000 as TIMESTAMP) BETWEEN '2020-11-29 22:30:00' and '2020-12-01 01:51:00'
     ) t

顺便说一句,last_value() 的问题在于,当您忽略窗口框架(窗口函数规范的 rowsrange 部分)时,它会出现意外行为。

问题在于默认规范是 range between unbounded preceding and current row,这意味着 last_value() 只是选取当前行中的值。

另一方面,first_value() 在默认框架下工作正常。但是,如果您包含显式框架,则两者是等效的。

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...