Re: Lockup 2.1.6* => kmalloc/slab

Mark Hemment (markhe@nextd.demon.co.uk)
Wed, 12 Nov 1997 10:36:11 +0000 (GMT)


On Tue, 11 Nov 1997, Frank van de Pol wrote:
> Okay, did that, and tried to get it locked like was reproducable with the
> SLAB_BREAK_GFP_ORDER set to 2 (default). My old recipy for making the
> machine lockup didn't work, the machine kept running. So I pushed harder
> (more apps running, keeping more pages in-use), until it locked up ;-(
>
> Then I changed the SLAB_BREAK_GFP_ORDER to 0, and tried again. As expected
> my machine was more 'prone' to this torturing :-) I was not able to get it
> into the lockup situatuation, but saw a weird phenomena:
>
> quoting from syslog....
>
> Nov 11 20:52:53 obelix kernel: kmalloc fails in alloc_skb() in skbuf.c;
> size=65620, priority=0

I cannot find this message in a 2.1.62 kernel, was it one you added
yourself for debugging?
'priority' 0 is GFP_ATOMIC. Asking for +64K as an atomic allocation....no
wonder there is a problem....

OK, dropping the SLAB's gfp-break to 1 is a good idea. But I can't
understand;
1) Why your box locks up.
2) Why an allocation for +64K.

There could be a bug in the SLAB which is causing the lock-up, but
it is also possible the the networking stack doesn't handle a failure
correctly somehwere...or it doesn't handle multi-failures correctly...or
doesn't use GFP_ATOMIC when it should... (The stack is a black-hole for
me...).
What can you do after your box locks? Switch multi-screens? Turn the
Num-lock on/off?
Of course, the lock-up may not be in the networking code. The allocations
the code performs triggers a page-reaping (race) bug somewhere else. This
does sounds v. plausable - I'll walk some code paths this evening. Your
not SMP are you?

markhe