Re: [PATCH] init: use KERNEL_DS when trying to start init process

From: Mathias Krause
Date: Fri Jun 10 2011 - 04:12:05 EST

Next message: Michal Hocko: "Re: [BUGFIX][PATCH v3] memcg: fix behavior of per cpu charge cachedraining."
Previous message: Dmitry Eremin-Solenikov: "Re: [PATCH] Make GFP_DMA allocations w/o ZONE_DMA emit a warninginstead of failing"
In reply to: Andrew Morton: "Re: [PATCH] init: use KERNEL_DS when trying to start init process"
Next in thread: Mathias Krause: "[PATCH] alpha, exec: remove redundant set_fs(USER_DS)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Jun 10, 2011 at 12:56 AM, Andrew Morton
<akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Thu, 9 Jun 2011 20:05:18 +0200
> Mathias Krause <minipli@xxxxxxxxxxxxxx> wrote:
>
>> Subject: [PATCH] exec: delay address limit change until point of no return
>>
>> Unconditionally changing the address limit to USER_DS and not restoring
>> it to its old value in the error path is wrong because it prevents us
>> using kernel memory on repeated calls to this function. This, in fact,
>> breaks the fallback of hard coded paths to the init program from being
>> ever successful if the first candidate fails to load.
>>
>> With this patch applied switching to USER_DS is delayed until the point
>> of no return is reached which makes it possible to have a multi-arch
>> rootfs with one arch specific init binary for each of the (hard coded)
>> probed paths.
>>
>> Since the address limit is already set to USER_DS when start_thread()
>> will be invoked, this redundancy can be safely removed.
>
> A couple of things here, please.
>
> The description doesn't describe the user-visible symptoms of the bug.

For current users there is no visible symptom. Current kernels will
just fail to start init when you try to rely on the fallback mechanism
of the kernel searching for init in different paths when building a
multi-arch rootfs. It just won't work if the first init binary fails
to start.

Maybe related, the bugzilla entry:
https://bugzilla.kernel.org/show_bug.cgi?id=36692

> This makes it hard for the -stable maintainers to work out whether they
> should accept the patch and it makes it hard for random distro
> maintainers to determine whether your patch might fix a user bug report
> which they're working on.

I think the first paragraph should have made clear what is wrong with
current kernels and the second one should have pointed out the
possibilities one has with this bug being fixed. It's broken since the
very beginning (Linus pointed to some 2.4.3 kernel) and personally I
would like to have it in the .32 longterm kernel, too.

> Secondly, I understand that we have identified changes which other arch
> maintainers should make and test. Please describe those changes to
> make it easy for them and please also describe a way in which they can
> test that change.

The changes for the other architectures are more of a cleanup than a
bug fix. They're calling set_fs(USER_DS) either in flush_thread() or
start_thread() which is unnecessary because they're already running
under USER_DS. But they did so even before my patch, so no "visible
changes" here. I've some follow up patches in the making to remove
those set_fs() calls but fall asleep last night so didn't finish them,
yet. Maybe later the day.

> Both these things could be addressed using a description of some
> testcase.

That's true -- I owe you a test case, so here it is:

Assume you want to build a multi-arch rootfs, e.g. combined 32 and 64
bit x86 (without CONFIG_IA32_EMULATION) or x86 + ARM + SPARC to have
one rootfs usable on a couple of different machines just by using a
different kernel. You could do this by providing a statically linked
init binary for each architecture and place them under /sbin/init,
/etc/init, /bin/init and /bin/sh. The kernel will fail to start
binaries not made for its native architecture but it should be able to
start the init binary build for its architecture. This is what is
currently broken and can be verified with the following test (assuming
a x86-64 environment):

cat<<EOT >hello.c
#include <unistd.h>
#include <stdio.h>

int main(void) {
printf("Hello %s world!\n", sizeof(int) == sizeof(long) ? "32" : "64");
pause();

return 0;
}
EOT

mkdir -p rootfs/etc rootfs/bin
gcc -static -m64 hello.c -o rootfs/etc/init
gcc -static -m32 hello.c -o rootfs/bin/init

cat<<EOT | gen_init_cpio - | gzip > initfs.gz
dir /dev 0755 0 0
nod /dev/console 0600 0 0 c 5 1
file /init /dev/null 0644 0 0
dir /sbin 0755 0 0
file /sbin/init /dev/null 0755 0 0
dir /etc 0755 0 0
file /etc/init rootfs/etc/init 0755 0 0
dir /bin 0755 0 0
file /bin/init rootfs/bin/init 0755 0 0
EOT

This generates an initramfs that won't boot on a current kernel
(3.0-rc2) but will, with the above patch applied.

Just some notes regarding the rootfs layout:
* The dummy /init file is needed, so the kernel won't call
prepare_namespace(). Otherwise it won't accept the initramfs as its
rootfs.
* The empty (but executable!) file /sbin/init should be the binary for
the "wrong" architecture, so won't execute. This should make the
kernel go ahead and try /etc/init.
* If you're running a 64 bit kernel /etc/init should start and print
out "Hello 64 bit world!", otherwise the kernel should fail to start
this binary and go ahead to /bin/init.
* Finally, if /etc/init failed, /bin/init should start and print out
"Hello 32 bit world!".

To use this test case for other architectures then x86 just skip the
etc/init file and remove the -m32 compiler switch. The dummy file
/sbin/init should make current kernels fail and kernels with the patch
applied succeed.

Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Michal Hocko: "Re: [BUGFIX][PATCH v3] memcg: fix behavior of per cpu charge cachedraining."
Previous message: Dmitry Eremin-Solenikov: "Re: [PATCH] Make GFP_DMA allocations w/o ZONE_DMA emit a warninginstead of failing"
In reply to: Andrew Morton: "Re: [PATCH] init: use KERNEL_DS when trying to start init process"
Next in thread: Mathias Krause: "[PATCH] alpha, exec: remove redundant set_fs(USER_DS)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]