问题描述
$ /home/cassandra/dsbulk-1.8.0/bin/dsbulk unload --driver.auth.provider PlainTextAuthProvider --driver.auth.username xxxx --driver.auth.password xxxx --datastax-java-driver .basic.contact-points 123.123.123.123 -query "select count(*) from soMetable with where on cluster column and partial pk -- 允许过滤" --connector.name json --driver.protocol.compression LZ4 --connector。 json.mode MULTI_DOCUMENT -maxConcurrentFiles 1 -maxRecords -1 -url dsbulk --executor.continuousPaging.enabled false --executor.maxpersecond 2500 --driver.socket.timeout 240000
Setting dsbulk.driver.protocol.compression is deprecated and will be removed in a future release; please configure the driver directly using --datastax-java-driver.advanced.protocol.compression instead.
Setting dsbulk.driver.auth.* is deprecated and will be removed in a future release; please configure the driver directly using --datastax-java-driver.advanced.auth-provider.* instead.
Operation directory: /home/cassandra/logs/COUNT_20210423-070104-108326
total | Failed | rows/s | p50ms | p99ms | p999ms
1 | 1 | 0 | 109,790.10 | 110,058.54 | 110,058.54
Operation COUNT_20210423-070104-108326 completed with 1 errors in 1 minute and 50 seconds.
这里是 dsbulk 记录 --
cassandra@somehost> cd logs
cassandra@somehost> cd COUNT_20210423-070104-108326/
cassandra@somehost> ls
operation.log unload-errors.log
cassandra@somehost> cat operation.log
2021-04-23 07:01:04 WARN Setting dsbulk.driver.protocol.compression is deprecated and will be removed in a future release; please configure the driver directly using --datastax-java-driver.advanced.protocol.compression instead.
2021-04-23 07:01:04 WARN Setting dsbulk.driver.auth.* is deprecated and will be removed in a future release; please configure the driver directly using --datastax-java-driver.advanced.auth-provider.* instead.
2021-04-23 07:01:04 INFO Operation directory: /home/cassandra/logs/COUNT_20210423-070104-108326
2021-04-23 07:02:55 WARN Operation COUNT_20210423-070104-108326 completed with 1 errors in 1 minute and 50 seconds.
2021-04-23 07:02:55 INFO Records: total: 1,successful: 0,Failed: 1
2021-04-23 07:02:55 INFO Memory usage: used: 212 MB,free: 1,922 MB,allocated: 2,135 MB,available: 27,305 MB,total gc count: 4,total gc time: 98 ms
2021-04-23 07:02:55 INFO Reads: total: 1,Failed: 1,in-flight: 0
2021-04-23 07:02:55 INFO Throughput: 0 reads/second
2021-04-23 07:02:55 INFO Latencies: mean 109,790.10,75p 110,058.54,99p 110,999p 110,058.54 milliseconds
2021-04-23 07:02:58 INFO Final stats:
2021-04-23 07:02:58 INFO Records: total: 1,Failed: 1
2021-04-23 07:02:58 INFO Memory usage: used: 251 MB,883 MB,total gc time: 98 ms
2021-04-23 07:02:58 INFO Reads: total: 1,in-flight: 0
2021-04-23 07:02:58 INFO Throughput: 0 reads/second
2021-04-23 07:02:58 INFO Latencies: mean 109,058.54 milliseconds
cassandra@somehost> cat unload-errors.log
Statement: com.datastax.oss.driver.internal.core.cql.DefaultBoundStatement@1083fef9 [0 values,idempotence: <UNSET>,CL: <UNSET>,serial CL: <UNSET>,timestamp: <UNSET>,timeout: <UNSET>]
SELECT batch_id from .... allow filtering (Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded))
at com.datastax.oss.dsbulk.executor.api.subscription.ResultSubscription.toErrorPage(ResultSubscription.java:534)
at com.datastax.oss.dsbulk.executor.api.subscription.ResultSubscription.lambda$fetchNextPage$1(ResultSubscription.java:372)
at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.setFinalError(CqlRequestHandler.java:447) [4 skipped]
at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.access$700(CqlRequestHandler.java:94)
at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler$NodeResponseCallback.processRetryVerdict(CqlRequestHandler.java:859)
at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler$NodeResponseCallback.processErrorResponse(CqlRequestHandler.java:828)
at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler$NodeResponseCallback.onResponse(CqlRequestHandler.java:655)
at com.datastax.oss.driver.internal.core.channel.InFlightHandler.channelRead(InFlightHandler.java:257)
at java.lang.Thread.run(Thread.java:748) [24 skipped]
Caused by: com.datastax.oss.driver.api.core.servererrors.ReadTimeoutException: Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded)
Cassandra 的 system.log 片段----
DEBUG [ScheduledTasks:1] 2021-04-23 00:01:48,539 MonitoringTask.java:152 - 1 operations timed out in the last 5015 msecs:
<SELECT * FROM my query being run with limit - LIMIT 5000>,total time 10004 msec,timeout 10000 msec/cross-node
INFO [ScheduledTasks:1] 2021-04-23 00:02:38,540 MessagingService.java:1302 - RANGE_SLICE messages were dropped in last 5000 ms: 0 internal and 1 cross node
. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 10299 ms
INFO [ScheduledTasks:1] 2021-04-23 00:02:38,551 StatusLogger.java:114 -
Pool Name Active Pending Completed Blocked All Time Blocked
ReadStage 1 0 1736872997 0 0
ContinuousPagingStage 0 0 586 0 0
RequestResponseStage 0 0 1483193130 0 0
ReadRepairstage 0 0 9079516 0 0
CounterMutationStage 0 0 0 0 0
MutationStage 0 0 351841038 0 0
ViewMutationStage 0 0 0 0 0
CommitLogArchiver 0 0 32961 0 0
MiscStage 0 0 0 0 0
CompactionExecutor 0 0 12034828 0 0
MemtableReclaimMemory 0 0 68612 0 0
PendingRangeCalculator 0 0 9 0 0
AntiCompactionExecutor 0 0 0 0 0
GossipStage 0 0 20137208 0 0
SecondaryIndexManagement 0 0 0 0 0
Hintsdispatcher 0 0 3798 0 0
MigrationStage 0 0 8 0 0
MemtablePostFlush 0 0 338955 0 0
PerdiskMemtableFlushWriter_0 0 0 66297 0 0
ValidationExecutor 0 0 247600 0 0
Sampler 0 0 0 0 0
MemtableFlushWriter 0 0 41757 0 0
InternalResponseStage 0 0 525242 0 0
AntiEntropyStage 0 0 767527 0 0
CacheCleanupExecutor 0 0 0 0 0
Native-Transport-Requests 0 0 958717934 0 65
CompactionManager 0 0
MessagingService n/a 0/0
Cache Type Size Capacity KeysToSave
KeyCache 104857216 104857600 all
RowCache 0 0 all
解决方法
问题是您在 DSBulk 中运行 unload
命令来执行 SELECT COUNT()
,这意味着它必须执行全表扫描才能返回单行。
此外,不建议使用 ALLOW FILTERING
,除非您将查询限制为单个分区。无论如何,即使在最佳情况下,ALLOW FILTERING
的性能也是非常不可预测的。
我建议您改用 DSBulk count
命令,该命令针对 Cassandra 中的行或分区计数进行了优化。有关详细信息,请参阅 Counting data with DSBulk example。
此 DSBulk Counting blog post 中还有其他示例,Alex Ott 已经在他的回答中链接了这些示例。干杯!
,在令牌范围上使用附加条件扩展 select count(*) from sometable with where on clustering column and partial pk -- allow filtering
,如下所示:and partial pk token(full_pk) > :start and token(full_pk) <= :end
- 在这种情况下,DSBulk 将对发送到多个节点的特定令牌范围执行许多查询,并且不会t 像您的情况一样在单个节点上创建负载。
查看 documentation for -query option 以及本系列关于 DSBulk 的博客文章中的第 4 篇博客,可以提供更多信息和示例:1、2、3 、4、5、6