3x1t ssd zfs镜像,其中两个出故障,另外一个也有错误如何排除故障?

问题描述

我的笔记本是dell precision 7740,插了5个SSD,一个是Windows 10系统盘,一个是Ubuntu 20.04(我的日常操作系统)系统盘,还有一个ZFS池,包括三个1TB SSD镜像,存储数据。
安装后,我用了一年没有检查状态,因为三个SSD都是新的,而且在我拿到时状态很好。

最后一张口出了点问题,所以用了sudo zpool status -v,发现有两个有问题,剩下的一个也有很多错误。

sudo zpool status -v

pool: tankmain

state: DEGRADED

status: One or more devices has experienced an error resulting in data corruption. Applications may be affected.

action: Restore the file in question if possible. Otherwise restore the entire pool from backup.

see: http://zfsonlinux.org/msg/ZFS-8000-8A

scan: resilvered 91.2G in 0 days 00:20:39 with 0 errors on Wed Feb 10 17:30:23 2021

config:

NAME STATE READ WRITE CKSUM

tankmain DEGRADED 0 0 0

mirror-0 DEGRADED 28 0 0

nvme-PLEXTORPX-1TM9PGN+_P02952500057 DEGRADED 47 0 220 too many errors

nvme-PLEXTORPX-1TM9PGN+_P02952500063 FAULTED 32 0 2 too many errors

nvme-PLEXTORPX-1TM9PGN+_P02952500220 FAULTED 22 0 3 too many errors

我无法相信这些 SSD 将不再起作用。我检查了智能信息,没有异常。写入的数据单元约为70TB,寿命为640TB。我检查了我的数据,其中一些已损坏。然后我重启我的笔记本电脑,重启后:

sudo zpool status -v

pool: tankmain

state: DEGRADED

status: One or more devices is currently being resilvered. The pool will continue to function,possibly in a degraded state.

action: Wait for the resilver to complete.

scan: resilver in progress since Sun Feb 14 15:38:53 2021 40.5G scanned at 251M/s,11.1G issued at 68.8M/s,755G total 23.1G resilvered,1.47% done,0 days 03:04:30 to go

config:

NAME STATE READ WRITE CKSUM

tankmain DEGRADED 0 0 0

mirror-0 DEGRADED 0 0 0

nvme-PLEXTORPX-1TM9PGN+_P02952500057 DEGRADED 0 0 0 too many errors

nvme-PLEXTORPX-1TM9PGN+_P02952500063 ONLINE 0 0 7 (resilvering)

nvme-PLEXTORPX-1TM9PGN+_P02952500220 ONLINE 0 0 11 (resilvering)

完成resilvering后:

sudo zpool status -v

pool: tankmain

state: DEGRADED

status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.

action: Determine if the device needs to be replaced,and clear the errors using 'zpool clear' or replace the device with 'zpool replace'.

see: http://zfsonlinux.org/msg/ZFS-8000-9P

scan: resilvered 83.5G in 0 days 00:10:26 with 0 errors on Sun Feb 14 15:49:19 2021

config:

NAME STATE READ WRITE CKSUM

tankmain DEGRADED 0 0 0

mirror-0 DEGRADED 0 0 0

nvme-PLEXTORPX-1TM9PGN+_P02952500057 DEGRADED 2 0 9 too many errors

nvme-PLEXTORPX-1TM9PGN+_P02952500063 ONLINE 0 0 15

nvme-PLEXTORPX-1TM9PGN+_P02952500220 ONLINE 0 0 19

errors: No known data errors

Then I scrubbed this zpool:

sudo zpool status -v

pool: tankmain

state: DEGRADED

status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state.

action: Replace the faulted device,or use 'zpool clear' to mark the device repaired.

scan: scrub in progress since Sun Feb 14 15:56:49 2021

209G scanned at 1.76G/s,903M issued at 7.59M/s,755G total

849K repaired,0.12% done,no estimated completion time config:

NAME STATE READ WRITE CKSUM

tankmain DEGRADED 0 0 0

mirror-0 DEGRADED 0 0 0

nvme-PLEXTORPX-1TM9PGN+_P02952500057 DEGRADED 3 0 9 too many errors (repairing)

nvme-PLEXTORPX-1TM9PGN+_P02952500063 FAULTED 32 0 1.90K too many errors (repairing)

nvme-PLEXTORPX-1TM9PGN+_P02952500220 FAULTED 64 0 419 too many errors (repairing)

完成repairing后:

sudo zpool status -v

pool: tankmain

state: DEGRADED

status: One or more devices has experienced an error resulting in data corruption. Applications may be affected.

action: Restore the file in question if possible. Otherwise restore the entire pool from backup.

see: http://zfsonlinux.org/msg/ZFS-8000-8A

scan: scrub repaired 970K in 0 days 00:29:42 with 213 errors on Sun Feb 14 16:26:31 2021 config:

NAME STATE READ WRITE CKSUM

tankmain DEGRADED 0 0 0

mirror-0 DEGRADED 168 0 0

nvme-PLEXTORPX-1TM9PGN+_P02952500057 DEGRADED 327 0 2.34K too many errors

nvme-PLEXTORPX-1TM9PGN+_P02952500063 FAULTED 32 0 690K too many errors

nvme-PLEXTORPX-1TM9PGN+_P02952500220 FAULTED 64 0 682K too many errors

some errors are repaired,but most are not.

然后我重新启动并在 BIOS 中禁用 P02952500057。这次重启到Ubuntu,只挂了两块盘,可以正常读写数据,所有数据都还在,没有损坏。但是 P02952500063 仍然是 DEGRADED,而 P02952500220 是 ONLINE。即使重新启动,一开始都是ONLINE的,但是手动刷洗后,P02952500063再次DEGRADED。 Scrub 可以检测到一些错误,并且都可以修复成功,但是如果再次刷洗,ZFS 仍然可以检测到错误,然后它们都可以修复成功。好像只有一个磁盘在运行,手动清理会将 P02952500063P02952500220 同步一次。我将我的数据转移到了一个安全的地方,销毁了这个 zpool,然后重建了一个,现在使用它没有错误。但我还是担心这种失败会再次发生。

如何找出原因?有什么建议可以避免再次发生这种情况吗?

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...