CQLcassandra-仅选择其中一列中具有最大值的行

问题描述

我需要找到具有给定stationid的行,这些行的time1大于指定的时间,并且具有最大的time2。

创建表的方式如下:

@H_502_5@const [resorts,resortsIsLoading,resortsError,fetchResorts] = useFetch(config.API_URL_GET_RESORTS);
const [resort,resortIsLoading,resortError,fetchResort] = useFetch(config.API_URL_GET_RESORT);

useEffect(() => {
  fetchResorts()
},[])

让我们假设表中的数据是这样的:

@H_502_5@CREATE TABLE forec (
    stationid int,time1 timestamp,time2 timestamp,value double,PRIMARY KEY ((stationid),time1,time2)
) WITH CLUSTERING ORDER BY (time1 DESC)

我想查询: 给我所有的行,其中stationid = 1且time1> = 2020-10-21 05:00:00且time2具有最大值。查询应返回以下行:

@H_502_5@    +------------+-----------------------+----------------------+--------+
    | stationid  | time1                 |  time2               |  value |
    +------------+-----------------------+----------------------+--------+
    | 1          | 2020-10-21 06:00:00   | 2020-10-21 05:00:00  | 1      |                                  
    | 1          | 2020-10-21 06:00:00   | 2020-10-21 04:00:00  | 2      |                                   
    | 1          | 2020-10-21 06:00:00   | 2020-10-21 03:00:00  | 3      |                                   
    | 1          | 2020-10-21 05:00:00   | 2020-10-21 04:00:00  | 4      |
    | 1          | 2020-10-21 05:00:00   | 2020-10-21 03:00:00  | 5      |
    | 1          | 2020-10-21 04:00:00   | 2020-10-21 02:00:00  | 6      |
    +------------+-----------------------+----------------------+--------+

我知道我可以这样查询

@H_502_5@    +------------+-----------------------+----------------------+--------+
    | stationid  | time1                 |  time2               |  value |
    +------------+-----------------------+----------------------+--------+
    | 1          | 2020-10-21 06:00:00   | 2020-10-21 05:00:00  | 1      |        
    | 1          | 2020-10-21 05:00:00   | 2020-10-21 04:00:00  | 4      | 
    +------------+-----------------------+----------------------+--------+

然后在客户端上过滤结果(并仅保留具有最大time2的行),但是我想知道这样做是否可以更有效地进行(在Cassandra端过滤结果)。

或者我应该更改表模型?

解决方法

使用UDA / UDF的解决方案:

状态功能:

CREATE OR REPLACE FUNCTION curValState ( state tuple<timestamp,double>,time timestamp,value double ) CALLED ON NULL INPUT RETURNS tuple<timestamp,double> LANGUAGE java AS 'if (time != null && value != null) { if(state == null) {com.datastax.driver.core.TupleType tupleType = com.datastax.driver.core.TupleType.of(com.datastax.driver.core.ProtocolVersion.NEWEST_SUPPORTED,com.datastax.driver.core.CodecRegistry.DEFAULT_INSTANCE,com.datastax.driver.core.DataType.timestamp(),com.datastax.driver.core.DataType.cdouble()); state = tupleType.newValue(time,value);} else {if(state.getTimestamp(0).compareTo(time)<0){ state.setTimestamp(0,time); state.setDouble(1,value);}}} return state;';

最终功能:

CREATE OR REPLACE FUNCTION finalVal ( state tuple<timestamp,double> ) CALLED ON NULL INPUT RETURNS double LANGUAGE java AS 'return state.getDouble(1);';

汇总功能:

CREATE OR REPLACE AGGREGATE valueatlatesttime (timestamp,double) SFUNC curValState STYPE tuple<timestamp,double> FINALFUNC finalVal INITCOND null;

查询:

SELECT
  stationid,time1,max(time2) AS max_time2,valueatlatesttime(time2,value) AS value_at_max_time2
FROM
  forec
WHERE
  stationid = 1
AND
  time1 >= '2020-10-21 05:00:00'
GROUP BY time1;
,

编辑:根据Cassandra document,“如果选择没有聚合函数的列,则在具有GROUP BY的语句中,将返回每个组中遇到的第一个值。”因此,下面的示例仅在time2DESC的顺序存储时有效。

如果您使用的是Cassandra的最新版本(例如3.11.x),则可以使用GROUP BY做类似的事情

SELECT
  stationid,value
FROM
  forec
WHERE
  stationid = 1
AND
  time1 >= '2020-10-21 05:00:00'
GROUP BY time1;

你会得到

cqlsh:test> SELECT stationid,max(time2) as max_time2,value FROM forec WHERE stationid = 1 AND time1 >= '2020-10-21 05:00:00' GROUP BY  time1;

 stationid | time1                           | max_time2                       | value
-----------+---------------------------------+---------------------------------+-------
         1 | 2020-10-21 06:00:00.000000+0000 | 2020-10-21 05:00:00.000000+0000 |     1
         1 | 2020-10-21 05:00:00.000000+0000 | 2020-10-21 04:00:00.000000+0000 |     4

(2 rows)

请注意,这会扫描您的分区,因此请注意分区大小,尤其是在群集列中使用时间戳时。