Re: Problem with 2.6 kernel and lots of I/O

From: Roy Keene
Date: Mon Jun 20 2005 - 17:55:09 EST


All,

Actually, the problem I have isn't specific to the using it over the local device. Quite often I have the problem where the secondary node goes down and comes back up after some time and needs to be resyncd. This is done on the master (raid1_resync) by hot-removing /dev/nbd1 and then hot-adding it back.

The result ? The slave node becomes completely unusable despite the fact that only nbd-server processes (two, the listener and the accepted socket) are running on there and nothing in the kernel context (well, at least w.r.t. to nbd, obviously some kernel code is involved ! :-P, but the nbd module doesn't even have to be loaded). And by unusable I mean I can no longer open files for writing, attempting to do so results in a hang until the resync is complete.

This is not-so-bad when the slave is being resync'd as the primary is still fully usable, but it really sucks when the primary goes down and needs to be resync'd from the secondary upon coming back up.

I'm thinking my system disks' RAID controller may be really horrible, or horribly supported. I have a RAID5 (hardware, uses the megaraid_mbox driver) of 3 x 73gb 10K RPM SCSI-320 disks and my write performance is .. horrible.

I've looked at "drbd" and it looks very promising, but I haven't had a chance to implement it yet, but it promises to resolve my resync time issues at least.


Roy Keene
Planning Systems Inc.

On Mon, 6 Jun 2005, Kyle Moffett wrote:

On Jun 5, 2005, at 06:11:02, Erik Slagter wrote:
On Wed, 2005-06-01 at 21:59 +0200, Pavel Machek wrote:

Start RAID in degraded mode with remote device (nbd1)
Hot-add local device (nbd0)

Stop right here. You may not use nbd over loopback.

Any specific reason (just curious)?

IIRC, because of the way the loopback delivers packets from the
same context as they are sent, it is possible (and quite easy)
to either deadlock or peg the CPU and make everything hang and
be unuseable. DRBD likewise used to have problems with testing
over the loopback until they added a special configuration
option to be extra careful and yield CPU.

Cheers,
Kyle Moffet


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/