Re: kblockd/1: page allocation failure in 2.6.9

From: Frank Steiner
Date: Sun Dec 26 2004 - 17:21:01 EST


James Bottomley wrote

The kblockd message is just a symptom of the machine running low on
memory and starting to fail normal kernel memory allocations. There's
always a potential for hangs when something can't allocate memory:
usually it's in the middle of a transaction and just forgets about it;
what should happen (as we just verified SCSI does) is that the
transaction should be rolled back and retried.

Ah, ok. Now at least I know what this messages mean! Indeed, as I wrote
in the first mail, 1.7GB of the 2GB memory of the machine were in use,
and that was "real" usage, i.e., without any buffers. But that was already
three hours after the failure occured. Unfortunately we didn't check what
process was using how much memory because we just rebooted as quick as
possible to get the server processes running again.
Might be a memory leak in some application, because neither the mysql
database nor the tomcat server should take more than a few hundred MB
of memory alltogether. I will ask the user to redo the mysql database
dump, maybe that was it and we can trigger the failure again. But that
will take two weeks until everyone is back in office :-)


- there were no messages "around" the kblockd messages in /var/log/messages
but the usual ones about remote ssh login, cron jobs etc., but the messages
were all more than 10 minutes "away" before and after the kblockd happened.

That's unfortunate. It means that whatever caused this left no trace.
The best working theory is still a memory allocation failure somewhere.
If it occurs again, could you get a full system process trace (<alt>-
<sysrq>-t) and send that? That might give a better clue as to what went
on.

Yes, sure! If we can reproduce it, I will send the log and try to figure
out which process takes so much memory. Thanks for your help!

cu,
Frank

--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: -4054
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/