Re: Lockup 2.1.6* => kmalloc/slab

Frank van de Pol (F.K.W.van.de.Pol@Inter.NL.net)
Thu, 13 Nov 1997 01:29:58 +0100 (MET)


Mark, I'll add my comments in the text below:

Mark Hemment wrote:
>
> On Tue, 11 Nov 1997, Frank van de Pol wrote:
> > Okay, did that, and tried to get it locked like was reproducable with the
> > SLAB_BREAK_GFP_ORDER set to 2 (default). My old recipy for making the
> > machine lockup didn't work, the machine kept running. So I pushed harder
> > (more apps running, keeping more pages in-use), until it locked up ;-(
> >
> > Then I changed the SLAB_BREAK_GFP_ORDER to 0, and tried again. As expected
> > my machine was more 'prone' to this torturing :-) I was not able to get it
> > into the lockup situatuation, but saw a weird phenomena:
> >
> > quoting from syslog....
> >
> > Nov 11 20:52:53 obelix kernel: kmalloc fails in alloc_skb() in skbuf.c;
> > size=65620, priority=0
>
> I cannot find this message in a 2.1.62 kernel, was it one you added
> yourself for debugging?

This message is one of the kprintf's I added to track down the lockup.

> 'priority' 0 is GFP_ATOMIC. Asking for +64K as an atomic allocation....no
> wonder there is a problem....

It's the net stack requesting these things... well eh ...

>
> OK, dropping the SLAB's gfp-break to 1 is a good idea. But I can't

I even dropped it to 0 to make the probability of success 1.0.

> understand;
> 1) Why your box locks up.

It does recover from the lock after a few seconds...

> 2) Why an allocation for +64K.

Don't know, perhaps the network guys know (Allen?) At that point I was doing
download of multiple 60MB files over my web server.

>
> There could be a bug in the SLAB which is causing the lock-up, but
> it is also possible the the networking stack doesn't handle a failure
> correctly somehwere...or it doesn't handle multi-failures correctly...or
> doesn't use GFP_ATOMIC when it should... (The stack is a black-hole for
> me...).
> What can you do after your box locks? Switch multi-screens? Turn the
> Num-lock on/off?

I can do switch screen, use the Alt-Sysreq functions (print tasks/mem etc.)
AS LONG AS I'M RUNNING ON A VC. X windows is _dead_.

> Of course, the lock-up may not be in the networking code. The allocations
> the code performs triggers a page-reaping (race) bug somewhere else. This
> does sounds v. plausable - I'll walk some code paths this evening. Your
> not SMP are you?

Nope, I'm running in a single Pentium 60 MHz (the old ones, still on 5 volts
with an absurd big, active heat sink); 32 MB RAM.

Regards,

Frank.

========================---------------->
#define NAME "Frank van de Pol"
#define ADDRESS "mgr. Nelislaan 10"
#define CITY "4741 AB Hoeven"
#define COUNTRY "The Netherlands"
#define EMAIL "F.K.W.van.de.Pol@inter.NL.net

Linux - Why use Windows, since there is a door?