ClickHouse失败,错误为“ DirectoryMonitor:额外信息的校验和不匹配:数据已损坏”

问题描述

我的二进制文件在数据内部带有曲线(我的错)。 ClickHouse无法将其插入表中并停止处理其他数据。我只是打开二进制文件进行编辑,并修复了错误的字段。之后,ClickHouse提取文件并给出了有关无效校验和的错误

default.affiliate_program.DirectoryMonitor: Code: 40,e.displayText() = 
DB::Exception: Checksum of extra info doesn't match: corrupted data. Reference: cb322c17e14d6816abfcdc16842e7bdd. Actual: f4afe41e77b9a92bfa4048648a3aebbb.,

之后,ClickHouse将文件传输到损坏的文件夹中,然后继续。 是否可以更改校验和,或者改写我处理过的文件的其他方式?

解决方法

https://github.com/ClickHouse/ClickHouse/issues/16005

无法更改Distributed table的.bin文件。它具有内置的校验和,因此需要重新计算它,但是很麻烦。

但是还有另一种方式。可以选择并手动插入未损坏的.bin文件#include <stdio.h> #include <stdlib.h> #include <limits.h> int main(void) { /* This whole thing has ben pulled apart for clarity and could (should!) be done in less lines of code. All implicit type conversions are made explicit. We also assume 2s complement (and a little bit more,to be honest). */ /* A and B limited to 0xffff to keep this code simple. If you want to get rid of these limits you need to count the bits of A and B and make sure that the sum does not exceed sizeof(int)*CHAR_BIT-1 */ signed int A = 0x7fff; signed int B = 0xc000; short int a0,a1; short int b0,b1; signed int shift = SHRT_MAX+1; unsigned int res1,res2; /* Additional temporary variables for legibility */ signed int t0,t1,t2,t3; /* Check input range */ if ((A > 0xffff) || (B > 0xffff)) { fprintf(stderr,"Input must not exceed 0xffff! A = 0x%x,B = 0x%x\n",A,B); exit(EXIT_FAILURE); } //unsigned multiplier res1 = (unsigned int)A * (unsigned int)B; //signed multiplier /* Compute A*B == (a1 * shift + a0) * (b1 * shift + b0) */ a0 = (short int)(A % shift); a1 = (short int)(A / shift); b0 = (short int)(B % shift); b1 = (short int)(B / shift); /* Multiply out for convenience: A*B == (a1 * 2^15 + a0) * (b1 * 2^15 + b0) == a1 * b1 *2^15 * 2^15 + a0 * b1 * 2^15 + a1 * b0 * 2^15 + a0 * b0 */ /* Here a1 and b1 are either 0 (zero) or 1 (one) and (SHRT_MAX+1)^2 < INT_MAX so t0 cannot overflow. You should make use of that fact in production. */ t0 = (signed int)a1 * (signed int)b1 * shift * shift; /* t0 in {0,shift^2} */ t1 = (signed int)a0 * (signed int)b1 * shift; /* t1 in {0,a0 * shift} */ t2 = (signed int)a1 * (signed int)b0 * shift; /* t2 in {0,b0 * shift} */ t3 = (signed int)a0 * (signed int)b0; /* t3 can get larger than INT_MAX! */ /* Cannot overflow because floor(sqrt(2^32-1)) = 0xffff and both A and B < 0xfff */ res2 = (unsigned int)t0 + (unsigned int)t1 + (unsigned int)t2 + (unsigned int)t3; printf("res1: 0x%x %d\nres2: 0x%x %d\n",res1,res2,res2); exit(EXIT_SUCCESS); }

更多详细信息:https://github.com/ClickHouse/ClickHouse/pull/9653

,

我只是扩展@Denny Crane的原始问题和答案。


由于配置错误default-profile,我也遇到了类似的错误-它被标记为readonly。结果,随后将数据插入分布式表会导致如下错误:

database.table.DirectoryMonitor: Code: 164,e.displayText() = 
DB::Exception: Received from **:9000. 
DB::Exception: default: Cannot execute query in readonly mode. Stack trace:

0. Poco::Exception::Exception(std::__1::basic_string<char,std::__1::char_traits<char>,std::__1::allocator<char> > const&,int) @ 0x10519be0 in /usr/bin/clickhouse
1. DB::Exception::Exception(std::__1::basic_string<char,int) @ 0x8f5072d in /usr/bin/clickhouse
..

插入分布式表时:

..数据块仅被写入 本地文件系统。 ..您应该检查是否已发送数据 通过检查文件列表(等待发送的数据)成功 在表格目录中:/ var / lib / clickhouse / data / database / table /。”

(有关详细信息,请参见Distributed Table Engine

让我们看一下 / var / lib / clickhouse / data / database / table / 文件夹,并检查“挂起”的bin文件以查找问题的原因:

sudo vim /var/lib/clickhouse/data/database/table/default@../3757.bin

bin文件的顶部是原始sql-query和查询设置。在我的情况下, readonly 设置为1(请参阅最后一行)是不允许执行插入的问题根源(格式bin-file将来可以更改,并且输出将有所不同):

εû×^L<88>^H¡©^Cõ^FINSERT INTO database.table({column_list}) VALUES^V
use_uncompressed_cache^@^A0^N
load_balancing^@^F
random^Q
force_primary_key^@^A1^K
log_queries^@^A1^H
readonly^@^A1^P
..

要解决此问题,需要:

  1. 修复根本原因(在我的情况下,需要正确定义 default -profile)并检查测试插入是否正确

  2. 在群集的每个节点上做

  • 将“挂起”的二进制文件从/var/lib/clickhouse/data/database/table/default@../移动到/var/lib/clickhouse/user_files/

  • 从CH打开文件并重新插入

# check data availability
SELECT count()
FROM file('*.bin','Distributed')

# re-insert data
INSERT INTO database.table
SELECT *
FROM file('*.bin','Distributed')
# FROM file('3757.bin','Distributed')
  • 从原始文件夹/var/lib/clickhouse/user_files/中删除已处理的“挂起”文件