Re: kernel panic - not syncing: out of memory and no killableprocesses

From: David Rientjes
Date: Fri Sep 18 2009 - 15:58:18 EST


On Fri, 18 Sep 2009, Eric Paris wrote:

> > Isolating udevd down to an interactivity scheduling change isn't _that_
> > bizarre. I think the setting of UDEVD_PRIORITY is already mostly
> > arbitrary anyway and it'll allow 192 children on your 512M machine by
> > default unless you changed UDEVD_MAX_CHILDS for uid 0.
> >
> > The default timeout for idle workers is 3 seconds, which may just happen
> > to be long enough to panic your machine because of low memory. If that's
> > the case, I don't believe that it's a scheduler issue but rather a root
> > abuse of setting all udevd threads to be OOM_DISABLE.
> >
> > What is your udevd --version? The latest is udev-146 released last month.
>
> 145
>
> Let me try and clone the vm some I don't break my reproducer. I'll see
> if adding more memory fixes it. Doesn't look like Fedora has built a
> -146 yet, I'll see if I can get one of those as well.
>
> udev bug, configuration issue, whatever, or not, it's a regression that
> I used to be able to boot and updating my kernel leaves me unable to
> boot. I think we all agree when 512M of memory isn't enough to boot to
> runlevel 3 we've got a problem :)
>

I totally agree, and my hypothesis is that the idle child workers are not
being killed in time that they quickly accumulate approaching
UDEVD_MAX_CHILDS and when the oom killer is called because of a write to
shared memory, it can't kill any of these threads either since udevd sets
them all to OOM_DISABLE and everything else is an unkillable kthread.

Bisecting that to a scheduler change would suggest that each udevd thread
isn't returning from its poll() timeout fast enough; there's essentially a
street race between udevd killing its own threads off because the poll
timeout was exceeded and all your memory being used up and the machine
panicking. The scheduling change seems to have affected the speed of the
former.

UDEVD_MAX_CHILDS defaults to 192 on your 512M machine unless overridden by
an environment variable of the same name, so you may find it helpful to
reduce this to a saner value. I'd suggest a value lower than the number
of udevd threads that were shown in your latest oom killer dump.

If that turns out to fix the issue for you, perhaps max_childs needs to be
calculated in a slightly more conservative way in the userspace package
since all threads come with the prerequisite of being OOM_DISABLE.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/