Re: Runaway loop with the current git.

From: Kay Sievers
Date: Sun Dec 07 2008 - 17:26:56 EST

Next message: Anders Kaseorg: "[PATCH] Don't call snprintf() with overlapping input and output."
Previous message: Eric Dumazet: "[PATCH] atomic: fix a typo in atomic_long_xchg()"
In reply to: Alan Cox: "Re: Runaway loop with the current git."
Next in thread: Theodore Tso: "Re: Runaway loop with the current git."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sun, Dec 7, 2008 at 21:00, Alan Cox <alan@xxxxxxxxxxxxxxxxxxx> wrote:
>> > max_modprobes = min(max_threads/2, MAX_KMOD_CONCURRENT);
>> > atomic_inc(&kmod_concurrent);
>> > if (atomic_read(&kmod_concurrent) > max_modprobes) {
>> > /* We may be blaming an innocent here, but unlikely */
>> > if (kmod_loop_msg++ < 5)
>> > printk(KERN_ERR
>> > "request_module: runaway loop modprobe
>> > %s\n", module_name);
>> > atomic_dec(&kmod_concurrent);
>> > return -ENOMEM;
>> > }
>> >
>> > Happy now. Print it out, share it with friends, find someone who can read
>> > C if you are stuck.
>>
>> It does not work, that's all.
>
> It works for me.

It runs in a loop here, just like the $subject says, and bko#12153 says.

>> Reproduce the bug and look at it for yourself.
>
> Well since you've got a reproducer and this code works for me (I've tested
> it just fine), why don't you go and reproduce the problem then post a fix
> to that code I quoted instead of all this reordering rubbish. If you fix
> this code not only won't you risk all the mess from re-ordering
> initialisations around the kernel but you'll fix non console related
> looping which you imply is also broken as you claim that code doesn't
> work for you.
>
> If I deliberately break my module utils I see a sequence of modprobes
> which then hits kmod_concurrent limit then causes a -ENOMEM back to
> userspace which then fails the file open. The bug report also shows the
> printk is displayed so the runaway loop *was* detected and the code paths
> taken which stopped the loop.

Sure, I can try to find out why the limiter does does not work here.

> I get open -> modprobe -> open -> modprobe -> open -> modprobe ... ->
> open fail, then open fail, open fail, open fail, open fail back to the
> first modprobe exiting.
>
> Your proposal to keep the current recent modprobe parameter strings would
> shorten the amount of recursion but it wouldn't change the result that I
> can see. If I open /dev/console early and wrongly from a modprobe then I
> ultimately get a failing open just as I should do.

No, my proposal would prevent the kernel to call any modprobe for the
special built-in device 5:1, it shortens the recursion to zero, which
sounds like the right fix. It's really weird to fix the symptoms,
instead of the real problem.

The kernel forks a binary with a broken environment. Even the
in-kernel-tree cpio generator creates /dev/console, and accessing it
from a kernel forked binary makes it crash. The kernel must provide a
sane environment for stuff that it calls, I think, that is pretty
obvious.

Device 5:1 is a core device, which never makes sense to call modprobe
for it. No other later driver will ever register that dev_t, so we
should do it before calling out to other random userspace stuff which
triggers the kernel to go crazy with its own devices.

My proposal was to connect 5:1 to the kobject map at the same time as
we register the whole tty class. We allocate the class, create sysfs
entries, run /sbin/modprobe at that time, why shouldn't we just
register the dev_t that time too?

It properly prevents the needless userspace "driver searching" for a
well known, already created, but not properly registered core device.

It makes the process environment sane, and prevents the root cause of
the bug, and does not just limit the damage.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Anders Kaseorg: "[PATCH] Don't call snprintf() with overlapping input and output."
Previous message: Eric Dumazet: "[PATCH] atomic: fix a typo in atomic_long_xchg()"
In reply to: Alan Cox: "Re: Runaway loop with the current git."
Next in thread: Theodore Tso: "Re: Runaway loop with the current git."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]