Mtv Europe ([info]mtve) wrote,
@ 2007-05-12 17:26:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
Replacing of failed drive in vinum RAID5

Two years and one month after setting up vinum RAID5 array, one disk has failed:

Apr 5 15:51:23 heart /kernel: ad4s1h: hard error reading fsbn 483750673 of 241875305-241875336 (ad4s1 bn 483750673; cn 479911 tn 6 sn 7)
Apr 5 15:51:23 heart /kernel: ad4: timeout waiting for cmd=ef s=e0 e=04
{...repeating many times
Apr 5 15:51:23 heart /kernel: ad4: timeout sending command=c4 s=e0 e=04
Apr 5 15:51:23 heart /kernel: ad4: error executing command - resetting
Apr 5 15:51:23 heart /kernel: ad4: timeout sending command=ec s=80 e=04
Apr 5 15:51:24 heart /kernel: ad4: ATA identify failed
Apr 5 15:51:24 heart /kernel: ad4: timeout sending command=c6 s=80 e=04
}
Apr 5 15:51:24 heart /kernel: vinum: Can't write config to /dev/ad4s1h, error 5
Apr 5 15:51:24 heart /kernel: ad4: timeout sending command=e7 s=80 e=04
Apr 5 15:51:24 heart /kernel: ad4: flushing cache on close failed

New IDE WD Caviar 200GB now costs $68 (was $107).

dmesg after replacing:

ad2: 190782MB <WDC WD2000JB-00GVA0> [387621/16/63] at ata1-master UDMA33
ad4: 190782MB <WDC WD2000JB-00REA0> [387621/16/63] at ata2-master UDMA100
ad6: 190782MB <WDC WD2000JB-00GVA0> [387621/16/63] at ata3-master UDMA100

'vinum printconfig' shows that it has lost the failed disk:

...
drive b device [nothing here]
...

So the recovery procedure was like this:

# fdisk -BI ad4			ignore "invalid mbr"
# disklabel -wB ad4s1 auto
# disklabel -e ad4s1		add new partition 'h' equal to 'c' with fstype vinum: 

   8 partitions:
   #        size   offset    fstype   [fsize bsize bps/cpg]
     c: 390721905        0    unused        0     0        # (Cyl.    0 - 387620*)
     h: 390721905        0     vinum                       # (Cyl.    0 - 387620*)

# cat vinum.tmp.conf
drive b device /dev/ad4s1h
# vinum create vinum.tmp.conf
# rm vinum.tmp.conf
# vinum start raid5.p0.s1	which was "stalled"

And now it does rebuild (approx. 6 hours).

Upd. Two days after: a crash after more then year of normal work.

IdlePTD at physical address 0x00454000
initial pcb at physical address 0x00392600
panicstr: ffs_blkfree: freeing free block
panic messages:
---
panic: ffs_blkfree: freeing free block

syncing disks... 5 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
giving up on 2 buffers

with stack trace:

#0  0xc01747fa in dumpsys ()
#1  0xc01745c4 in boot ()
#2  0xc01749f8 in poweroff_wait ()
#3  0xc0266dff in ffs_blkfree ()
#4  0xc026b934 in indir_trunc ()
#5  0xc026b6e6 in handle_workitem_freeblocks ()
#6  0xc0269b2b in process_worklist_item ()
#7  0xc02699ba in softdep_process_worklist ()
#8  0xc01a3e6b in sched_sync ()



(Read 5 comments) - (Post a new comment)

> Vinum is a notoriously bad piece of software
[info]poige
2007-05-13 04:09 am UTC (link)
That's what I said when [info]mtve had ran this array. :)

But anyway, it may work ok. :)

(Reply to this) (Parent)


(Read 5 comments) - (Post a new comment)

Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…