| Mtv Europe ( @ 2007-05-12 17:26:00 |
Replacing of failed drive in vinum RAID5
Two years and one month after setting up vinum RAID5 array, one disk has failed:
Apr 5 15:51:23 heart /kernel: ad4s1h: hard error reading fsbn 483750673 of 241875305-241875336 (ad4s1 bn 483750673; cn 479911 tn 6 sn 7)
Apr 5 15:51:23 heart /kernel: ad4: timeout waiting for cmd=ef s=e0 e=04
{...repeating many times
Apr 5 15:51:23 heart /kernel: ad4: timeout sending command=c4 s=e0 e=04
Apr 5 15:51:23 heart /kernel: ad4: error executing command - resetting
Apr 5 15:51:23 heart /kernel: ad4: timeout sending command=ec s=80 e=04
Apr 5 15:51:24 heart /kernel: ad4: ATA identify failed
Apr 5 15:51:24 heart /kernel: ad4: timeout sending command=c6 s=80 e=04
}
Apr 5 15:51:24 heart /kernel: vinum: Can't write config to /dev/ad4s1h, error 5
Apr 5 15:51:24 heart /kernel: ad4: timeout sending command=e7 s=80 e=04
Apr 5 15:51:24 heart /kernel: ad4: flushing cache on close failed
New IDE WD Caviar 200GB now costs $68 (was $107).
dmesg after replacing:
ad2: 190782MB <WDC WD2000JB-00GVA0> [387621/16/63] at ata1-master UDMA33 ad4: 190782MB <WDC WD2000JB-00REA0> [387621/16/63] at ata2-master UDMA100 ad6: 190782MB <WDC WD2000JB-00GVA0> [387621/16/63] at ata3-master UDMA100
'vinum printconfig' shows that it has lost the failed disk:
... drive b device [nothing here] ...
So the recovery procedure was like this:
# fdisk -BI ad4 ignore "invalid mbr"
# disklabel -wB ad4s1 auto
# disklabel -e ad4s1 add new partition 'h' equal to 'c' with fstype vinum:
8 partitions:
# size offset fstype [fsize bsize bps/cpg]
c: 390721905 0 unused 0 0 # (Cyl. 0 - 387620*)
h: 390721905 0 vinum # (Cyl. 0 - 387620*)
# cat vinum.tmp.conf
drive b device /dev/ad4s1h
# vinum create vinum.tmp.conf
# rm vinum.tmp.conf
# vinum start raid5.p0.s1 which was "stalled"
And now it does rebuild (approx. 6 hours).
Upd. Two days after: a crash after more then year of normal work.
IdlePTD at physical address 0x00454000 initial pcb at physical address 0x00392600 panicstr: ffs_blkfree: freeing free block panic messages: --- panic: ffs_blkfree: freeing free block syncing disks... 5 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 giving up on 2 buffers
with stack trace:
#0 0xc01747fa in dumpsys () #1 0xc01745c4 in boot () #2 0xc01749f8 in poweroff_wait () #3 0xc0266dff in ffs_blkfree () #4 0xc026b934 in indir_trunc () #5 0xc026b6e6 in handle_workitem_freeblocks () #6 0xc0269b2b in process_worklist_item () #7 0xc02699ba in softdep_process_worklist () #8 0xc01a3e6b in sched_sync ()