Re: kblockd/1: page allocation failure in 2.6.9

From: James Bottomley
Date: Sun Dec 26 2004 - 10:54:11 EST


On Sat, 2004-12-25 at 23:49 +0100, Frank Steiner wrote:
> - If you suspect the gdth driver causing the error, it must be some very
> special situation on this host causing it. We have 2 other hosts
> with the same icp vortex GDT8514RZ controller like the host
> where the kblockd message occured. They all have internal raid1 disks
> (73gb or 146gb). One is our main NFS server (it has two raid1 with 146g
> each) and it has a lot of I/O, sometimes 50GB or more a day with peaks
> up to 200MB per second (reading), and we never saw any kblockd message
> in the logs (I just checked them all).

The kblockd message is just a symptom of the machine running low on
memory and starting to fail normal kernel memory allocations. There's
always a potential for hangs when something can't allocate memory:
usually it's in the middle of a transaction and just forgets about it;
what should happen (as we just verified SCSI does) is that the
transaction should be rolled back and retried.

> - there were no messages "around" the kblockd messages in /var/log/messages
> but the usual ones about remote ssh login, cron jobs etc., but the messages
> were all more than 10 minutes "away" before and after the kblockd happened.

That's unfortunate. It means that whatever caused this left no trace.
The best working theory is still a memory allocation failure somewhere.
If it occurs again, could you get a full system process trace (<alt>-
<sysrq>-t) and send that? That might give a better clue as to what went
on.

James


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/