在 Cassandra 上运行 YCSB 时,2.5 小时后观察到 READ-FAILED

问题描述

我是 Cassandra 和 YCSB 的新手,并尝试在通过 docker-compose 和 YCSB 构建的 3 节点 cassandra 集群上运行基准测试。

YCSB 的加载阶段在 4 小时内完成,没有任何错误或问题,但在运行阶段,我在运行加载 2.5 小时(9212 秒)后看到“READ-Failed错误。我尝试运行相同的测试几次,但看到相同的问题,不知道为什么。

.
.
2021-05-27 22:22:53:019 9208 sec: 8625003 operations; 661 current ops/sec; est completion in 6 days 1 hour [READ: Count=133,Max=89599,Min=311,Avg=5145.44,90=10551,99=78783,99.9=89599,99.99=89599] [READ-MODIFY-WRITE: Count=69,Max=26751,Min=707,Avg=4425.57,90=11583,99=18271,99.9=26751,99.99=26751] [INSERT: Count=450,Max=1432,Min=216,Avg=537.25,90=818,99=1128,99.9=1432,99.99=1432] [UPDATE: Count=145,Max=1471,Min=184,Avg=472.85,90=733,99=1284,99.9=1471,99.99=1471]
2021-05-27 22:22:54:019 9209 sec: 8625668 operations; 665 current ops/sec; est completion in 6 days 1 hour [READ: Count=127,Max=66367,Min=334,Avg=4931.35,90=12767,99=36127,99.9=66367,99.99=66367] [READ-MODIFY-WRITE: Count=64,Max=36543,Min=709,Avg=4670.2,90=13511,99=34143,99.9=36543,99.99=36543] [INSERT: Count=458,Max=2303,Min=237,Avg=589.22,90=869,99=1195,99.9=2303,99.99=2303] [UPDATE: Count=144,Max=1190,Min=218,Avg=501.5,90=759,99=1186,99.9=1190,99.99=1190]
2021-05-27 22:22:55:019 9210 sec: 8626279 operations; 611 current ops/sec; est completion in 6 days 1 hour [READ: Count=110,Max=98495,Min=399,Avg=6190.99,90=12063,99=38431,99.9=98495,99.99=98495] [READ-MODIFY-WRITE: Count=55,Max=100095,Min=692,Avg=8793.56,90=15983,99=39999,99.9=100095,99.99=100095] [INSERT: Count=441,Max=1659,Min=241,Avg=624.24,90=969,99=1327,99.9=1659,99.99=1659] [UPDATE: Count=119,Max=1395,Min=187,Avg=571.55,90=909,99=1310,99.9=1395,99.99=1395]
2021-05-27 22:22:56:019 9211 sec: 8626842 operations; 563 current ops/sec; est completion in 6 days 1 hour [READ: Count=118,Max=97215,Min=318,Avg=5499.74,90=10463,99=93055,99.9=97215,99.99=97215] [READ-MODIFY-WRITE: Count=45,Min=742,Avg=5810.96,90=8807,99=98495,99.99=98495] [INSERT: Count=385,Max=1252,Min=239,Avg=616.27,90=924,99=1163,99.9=1252,99.99=1252] [UPDATE: Count=101,Max=1327,Min=195,Avg=580.12,90=904,99=1097,99.9=1327,99.99=1327]
2021-05-27 22:22:57:019 9212 sec: 8627010 operations; 168 current ops/sec; est completion in 6 days 1 hour [READ: Count=33,Max=90367,Min=732,Avg=12685.67,90=35679,99=90367,99.9=90367,99.99=90367] [READ-MODIFY-WRITE: Count=18,Max=93183,Min=1121,Avg=17020.33,90=36895,99=93183,99.9=93183,99.99=93183] [INSERT: Count=120,Max=109951,Min=325,Avg=2155.85,90=3283,99=7943,99.9=109951,99.99=109951] [UPDATE: Count=35,Max=11567,Min=302,Avg=1142.29,90=2081,99=11567,99.9=11567,99.99=11567] [READ-Failed: Count=1,Max=23615,Min=23600,Avg=23608,90=23615,99=23615,99.9=23615,99.99=23615]
2021-05-27 22:22:58:019 9213 sec: 8627523 operations; 513 current ops/sec; est completion in 6 days 1 hour [READ: Count=87,Max=97151,Min=417,Avg=8968.98,90=14639,99=67967,99.9=97151,99.99=97151] [READ-MODIFY-WRITE: Count=44,Max=62303,Min=654,Avg=7554.91,90=14047,99=62303,99.9=62303,99.99=62303] [INSERT: Count=378,Max=1220,Min=240,Avg=467.85,90=686,99=1030,99.9=1220,99.99=1220] [UPDATE: Count=97,Max=1017,Min=217,Avg=411.89,90=649,99=861,99.9=1017,99.99=1017] [READ-Failed: Count=0,Max=0,Min=9223372036854775807,Avg=NaN,90=0,99=0,99.9=0,99.99=0]
2021-05-27 22:22:59:019 9214 sec: 8628119 operations; 596 current ops/sec; est completion in 6 days 1 hour [READ: Count=115,Max=112063,Avg=6460.7,90=12127,99=90943,99.9=112063,99.99=112063] [READ-MODIFY-WRITE: Count=58,Max=91711,Min=788,Avg=6967.95,90=13015,99=60575,99.9=91711,99.99=91711] [INSERT: Count=423,Max=1359,Min=234,Avg=473.31,90=708,99=895,99.9=1359,99.99=1359] [UPDATE: Count=108,Max=1033,Min=210,Avg=429.63,90=637,99=1031,99.9=1033,99.99=1033] [READ-Failed: Count=0,99.99=0]
2021-05-27 22:23:00:019 9215 sec: 8628679 operations; 560 current ops/sec; est completion in 6 days 1 hour [READ: Count=117,Max=115071,Min=327,Avg=6498.37,90=16143,99=64863,99.9=115071,99.99=115071] [READ-MODIFY-WRITE: Count=66,Max=65599,Min=607,Avg=6775.21,90=17151,99=48191,99.9=65599,99.99=65599] [INSERT: Count=391,Max=1137,Avg=466.95,90=711,99=1021,99.9=1137,99.99=1137] [UPDATE: Count=118,Max=1338,Min=191,Avg=438.92,90=674,99=1012,99.9=1338,99.99=1338] [READ-Failed: Count=0,99.99=0]
2021-05-27 22:23:01:019 9216 sec: 8629411 operations; 732 current ops/sec; est completion in 6 days 1 hour [READ: Count=139,Max=94143,Min=390,Avg=5108.03,90=10015,99=59999,99.9=94143,99.99=94143] [READ-MODIFY-WRITE: Count=71,Max=95039,Min=597,Avg=5881.15,90=8959,99=41823,99.9=95039,99.99=95039] [INSERT: Count=524,Max=1256,Min=200,Avg=443.07,90=639,99=1023,99.9=1218,99.99=1256] [UPDATE: Count=142,Max=988,Min=174,Avg=404.29,90=659,99=926,99.9=988,99.99=988] [READ-Failed: Count=0,99.99=0]
2021-05-27 22:23:02:019 9217 sec: 8629929 operations; 518 current ops/sec; est completion in 6 days 1 hour [READ: Count=116,Max=103615,Min=362,Avg=6558.6,90=12535,99=89599,99.9=103615,99.99=103615] [READ-MODIFY-WRITE: Count=55,Max=103999,Min=619,Avg=7671.18,90=15127,99=19727,99.9=103999,99.99=103999] [INSERT: Count=344,Max=960,Min=233,Avg=481.37,90=683,99=892,99.9=960,99.99=960] [UPDATE: Count=111,Max=818,Min=189,Avg=402.95,90=596,99=779,99.9=818,99.99=818] [READ-Failed: Count=0,99.99=0]
.
.
.

但是,当我在 MongoDB 上运行基准测试时,它运行良好,没有看到任何错误。 请让我知道在 Cassandra yml 部署中或在 Cassandra 集群上运行 YCSB 时是否需要更改任何设置或参数。

如果您需要更多日志,请告诉我,我们会根据要求上传它们。目前,我已经上传了 2 个日志文件(在 github 上),一个用于 docker 和 Cassandra 日志,另一个用于 YCSB 执行。

感谢任何帮助。

[ycsb_logs.txt] https://github.com/neekhraashish/logs/blob/main/ycsb_logs.txt

[docker_cassandra_logs.txt] https://github.com/neekhraashish/logs/blob/main/docker_cassandra_logs.txt

谢谢

解决方法

查看 Cassandra 日志,集群未处于健康状态 - 有几点值得注意:

  • 提交日志同步警告 - 这表明底层 IO 没有跟上写入磁盘的提交日志。
  • 删除的突变 - 在节点之间删除了许多操作,然后在读取时发现摘要不匹配时以同步读取修复的形式返回 - 这些读取修复也经常失败。

有关如何配置存储/io 的更多详细信息会很有用。