Mtv Europe ([info]mtve) wrote,
@ 2007-05-12 17:26:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
Replacing of failed drive in vinum RAID5

Two years and one month after setting up vinum RAID5 array, one disk has failed:

Apr 5 15:51:23 heart /kernel: ad4s1h: hard error reading fsbn 483750673 of 241875305-241875336 (ad4s1 bn 483750673; cn 479911 tn 6 sn 7)
Apr 5 15:51:23 heart /kernel: ad4: timeout waiting for cmd=ef s=e0 e=04
{...repeating many times
Apr 5 15:51:23 heart /kernel: ad4: timeout sending command=c4 s=e0 e=04
Apr 5 15:51:23 heart /kernel: ad4: error executing command - resetting
Apr 5 15:51:23 heart /kernel: ad4: timeout sending command=ec s=80 e=04
Apr 5 15:51:24 heart /kernel: ad4: ATA identify failed
Apr 5 15:51:24 heart /kernel: ad4: timeout sending command=c6 s=80 e=04
}
Apr 5 15:51:24 heart /kernel: vinum: Can't write config to /dev/ad4s1h, error 5
Apr 5 15:51:24 heart /kernel: ad4: timeout sending command=e7 s=80 e=04
Apr 5 15:51:24 heart /kernel: ad4: flushing cache on close failed

New IDE WD Caviar 200GB now costs $68 (was $107).

dmesg after replacing:

ad2: 190782MB <WDC WD2000JB-00GVA0> [387621/16/63] at ata1-master UDMA33
ad4: 190782MB <WDC WD2000JB-00REA0> [387621/16/63] at ata2-master UDMA100
ad6: 190782MB <WDC WD2000JB-00GVA0> [387621/16/63] at ata3-master UDMA100

'vinum printconfig' shows that it has lost the failed disk:

...
drive b device [nothing here]
...

So the recovery procedure was like this:

# fdisk -BI ad4			ignore "invalid mbr"
# disklabel -wB ad4s1 auto
# disklabel -e ad4s1		add new partition 'h' equal to 'c' with fstype vinum: 

   8 partitions:
   #        size   offset    fstype   [fsize bsize bps/cpg]
     c: 390721905        0    unused        0     0        # (Cyl.    0 - 387620*)
     h: 390721905        0     vinum                       # (Cyl.    0 - 387620*)

# cat vinum.tmp.conf
drive b device /dev/ad4s1h
# vinum create vinum.tmp.conf
# rm vinum.tmp.conf
# vinum start raid5.p0.s1	which was "stalled"

And now it does rebuild (approx. 6 hours).

Upd. Two days after: a crash after more then year of normal work.

IdlePTD at physical address 0x00454000
initial pcb at physical address 0x00392600
panicstr: ffs_blkfree: freeing free block
panic messages:
---
panic: ffs_blkfree: freeing free block

syncing disks... 5 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
giving up on 2 buffers

with stack trace:

#0  0xc01747fa in dumpsys ()
#1  0xc01745c4 in boot ()
#2  0xc01749f8 in poweroff_wait ()
#3  0xc0266dff in ffs_blkfree ()
#4  0xc026b934 in indir_trunc ()
#5  0xc026b6e6 in handle_workitem_freeblocks ()
#6  0xc0269b2b in process_worklist_item ()
#7  0xc02699ba in softdep_process_worklist ()
#8  0xc01a3e6b in sched_sync ()



(Read 5 comments) - (Post a new comment)


[info]jeffr_tech
2007-05-13 03:06 am UTC (link)
Vinum is a notoriously bad piece of software. Unfortunately it was always buggy. It's now been replaced with a compatible piece of code called gvinum. I can't report whether or not that's any better. However, I think it's unlikely it was the sata support.

(Reply to this) (Parent)(Thread)


[info]approachmdnight
2007-05-13 03:14 am UTC (link)
I think I may have been using gvnium.

In the end it turned out the SATA card was flaky in Linux as well when using RAID. It was odd since the array would be corrupted almost instantly, but I/O to individual drives worked perfectly.

I've been willing to throw my friend some bucks to get a decent SATA card (as in not some $20 Silicon Image reference design that caught on fire according to one Newegg review) or a motherboard with builtin SATA support. He has been too lazy to pick out parts, so we never got the array working. :P

(Reply to this) (Parent)

> Vinum is a notoriously bad piece of software
[info]poige
2007-05-13 04:09 am UTC (link)
That's what I said when [info]mtve had ran this array. :)

But anyway, it may work ok. :)

(Reply to this) (Parent)


[info]mtve
2007-05-13 10:07 am UTC (link)

I know, I know.

Only two reasons kept me with this: it works for me, and I'm just a bit lazy to upgrade.

(Reply to this) (Parent)


(Read 5 comments) - (Post a new comment)

Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…