问题描述
我需要找到具有给定stationid的行,这些行的time1大于指定的时间,并且具有最大的time2。
创建表的方式如下:
@H_502_5@const [resorts,resortsIsLoading,resortsError,fetchResorts] = useFetch(config.API_URL_GET_RESORTS); const [resort,resortIsLoading,resortError,fetchResort] = useFetch(config.API_URL_GET_RESORT); useEffect(() => { fetchResorts() },[])
让我们假设表中的数据是这样的:
@H_502_5@CREATE TABLE forec ( stationid int,time1 timestamp,time2 timestamp,value double,PRIMARY KEY ((stationid),time1,time2) ) WITH CLUSTERING ORDER BY (time1 DESC)
我想查询: 给我所有的行,其中stationid = 1且time1> = 2020-10-21 05:00:00且time2具有最大值。查询应返回以下行:
@H_502_5@ +------------+-----------------------+----------------------+--------+ | stationid | time1 | time2 | value | +------------+-----------------------+----------------------+--------+ | 1 | 2020-10-21 06:00:00 | 2020-10-21 05:00:00 | 1 | | 1 | 2020-10-21 06:00:00 | 2020-10-21 04:00:00 | 2 | | 1 | 2020-10-21 06:00:00 | 2020-10-21 03:00:00 | 3 | | 1 | 2020-10-21 05:00:00 | 2020-10-21 04:00:00 | 4 | | 1 | 2020-10-21 05:00:00 | 2020-10-21 03:00:00 | 5 | | 1 | 2020-10-21 04:00:00 | 2020-10-21 02:00:00 | 6 | +------------+-----------------------+----------------------+--------+
我知道我可以这样查询:
@H_502_5@ +------------+-----------------------+----------------------+--------+ | stationid | time1 | time2 | value | +------------+-----------------------+----------------------+--------+ | 1 | 2020-10-21 06:00:00 | 2020-10-21 05:00:00 | 1 | | 1 | 2020-10-21 05:00:00 | 2020-10-21 04:00:00 | 4 | +------------+-----------------------+----------------------+--------+
然后在客户端上过滤结果(并仅保留具有最大time2的行),但是我想知道这样做是否可以更有效地进行(在Cassandra端过滤结果)。
或者我应该更改表模型?
解决方法
使用UDA / UDF的解决方案:
状态功能:
CREATE OR REPLACE FUNCTION curValState ( state tuple<timestamp,double>,time timestamp,value double ) CALLED ON NULL INPUT RETURNS tuple<timestamp,double> LANGUAGE java AS 'if (time != null && value != null) { if(state == null) {com.datastax.driver.core.TupleType tupleType = com.datastax.driver.core.TupleType.of(com.datastax.driver.core.ProtocolVersion.NEWEST_SUPPORTED,com.datastax.driver.core.CodecRegistry.DEFAULT_INSTANCE,com.datastax.driver.core.DataType.timestamp(),com.datastax.driver.core.DataType.cdouble()); state = tupleType.newValue(time,value);} else {if(state.getTimestamp(0).compareTo(time)<0){ state.setTimestamp(0,time); state.setDouble(1,value);}}} return state;';
最终功能:
CREATE OR REPLACE FUNCTION finalVal ( state tuple<timestamp,double> ) CALLED ON NULL INPUT RETURNS double LANGUAGE java AS 'return state.getDouble(1);';
汇总功能:
CREATE OR REPLACE AGGREGATE valueatlatesttime (timestamp,double) SFUNC curValState STYPE tuple<timestamp,double> FINALFUNC finalVal INITCOND null;
查询:
SELECT
stationid,time1,max(time2) AS max_time2,valueatlatesttime(time2,value) AS value_at_max_time2
FROM
forec
WHERE
stationid = 1
AND
time1 >= '2020-10-21 05:00:00'
GROUP BY time1;
,
编辑:根据Cassandra document,“如果选择没有聚合函数的列,则在具有GROUP BY的语句中,将返回每个组中遇到的第一个值。”因此,下面的示例仅在time2
以DESC
的顺序存储时有效。
如果您使用的是Cassandra的最新版本(例如3.11.x),则可以使用GROUP BY
做类似的事情
SELECT
stationid,value
FROM
forec
WHERE
stationid = 1
AND
time1 >= '2020-10-21 05:00:00'
GROUP BY time1;
你会得到
cqlsh:test> SELECT stationid,max(time2) as max_time2,value FROM forec WHERE stationid = 1 AND time1 >= '2020-10-21 05:00:00' GROUP BY time1;
stationid | time1 | max_time2 | value
-----------+---------------------------------+---------------------------------+-------
1 | 2020-10-21 06:00:00.000000+0000 | 2020-10-21 05:00:00.000000+0000 | 1
1 | 2020-10-21 05:00:00.000000+0000 | 2020-10-21 04:00:00.000000+0000 | 4
(2 rows)
请注意,这会扫描您的分区,因此请注意分区大小,尤其是在群集列中使用时间戳时。