Re: [PATCH] binfmt_elf: Support segments with 0 filesz and misaligned starts

From: Eric W. Biederman
Date: Mon Sep 25 2023 - 23:27:41 EST


Kees Cook <keescook@xxxxxxxxxxxx> writes:

> On Mon, Sep 25, 2023 at 05:27:12PM +0200, Sebastian Ott wrote:
>> On Mon, 25 Sep 2023, Eric W. Biederman wrote:
>> >
>> > Implement a helper elf_load that wraps elf_map and performs all
>> > of the necessary work to ensure that when "memsz > filesz"
>> > the bytes described by "memsz > filesz" are zeroed.
>> >
>> > Link: https://lkml.kernel.org/r/20230914-bss-alloc-v1-1-78de67d2c6dd@xxxxxxxxxxxxxx
>> > Reported-by: Sebastian Ott <sebott@xxxxxxxxxx>
>> > Reported-by: Thomas Weißschuh <linux@xxxxxxxxxxxxxx>
>> > Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
>> > ---
>> > fs/binfmt_elf.c | 111 +++++++++++++++++++++---------------------------
>> > 1 file changed, 48 insertions(+), 63 deletions(-)
>> >
>> > Can you please test this one?
>
> Eric thanks for doing this refactoring! This does look similar to the
> earlier attempt:
> https://lore.kernel.org/lkml/20221106021657.1145519-1-pedro.falcato@xxxxxxxxx/
> and it's a bit easier to review.

I need to context switch away for a while so Kees if you will
I will let you handle the rest of this one.


A couple of thoughts running through my head for anyone whose ambitious
might include cleaning up binfmt_elf.c

The elf_bss variable in load_elf_binary can be removed.

Work for a follow on patch is using my new elf_load in load_elf_interp
and possibly in load_elf_library. (More code size reduction).

An outstanding issue is if the first segment has filesz 0, and has a
randomized locations. But that is the same as today.

There is a whole question does it make sense for the elf loader
to have it's own helper vm_brk_flags in mm/mmap.c or would it
make more sense for binfmt_elf to do what binfmt_elf_fdpic does and
have everything to go through vm_mmap.

I think replacing vm_brk_flags with vm_mmap would allow fixing the
theoretical issue of filesz 0 and randomizing locations.



In this change I replaced an open coded padzero that did not clear
all of the way to the end of the page, with padzero that does.

I also stopped checking the return of padzero as there is at least
one known case where testing for failure is the wrong thing to do.
It looks like binfmt_elf_fdpic may have the proper set of tests
for when error handling can be safely completed.

I found a couple of commits in the old history
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git,
that look very interesting in understanding this code.

commit 39b56d902bf3 ("[PATCH] binfmt_elf: clearing bss may fail")
commit c6e2227e4a3e ("[SPARC64]: Missing user access return value checks in fs/binfmt_elf.c and fs/compat.c")
commit 5bf3be033f50 ("v2.4.10.1 -> v2.4.10.2")

Looking at commit 39b56d902bf3 ("[PATCH] binfmt_elf: clearing bss may fail"):
> commit 39b56d902bf35241e7cba6cc30b828ed937175ad
> Author: Pavel Machek <pavel@xxxxxx>
> Date: Wed Feb 9 22:40:30 2005 -0800
>
> [PATCH] binfmt_elf: clearing bss may fail
>
> So we discover that Borland's Kylix application builder emits weird elf
> files which describe a non-writeable bss segment.
>
> So remove the clear_user() check at the place where we zero out the bss. I
> don't _think_ there are any security implications here (plus we've never
> checked that clear_user() return value, so whoops if it is a problem).
>
> Signed-off-by: Pavel Machek <pavel@xxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxx>
> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxx>

It seems pretty clear that binfmt_elf_fdpic with skipping clear_user
for non-writable segments and otherwise calling clear_user (aka padzero)
and checking it's return code is the right thing to do.

I just skipped the error checking as that avoids breaking things.

It looks like Borland's Kylix died in 2005 so it might be safe to
just consider read-only segments with memsz > filesz an error.


Looking at commit 5bf3be033f50 ("v2.4.10.1 -> v2.4.10.2") the
binfmt_elf.c bits confirm my guess that the weird structure is because
before that point binfmt_elf.c assumed there would be only a single
segment with memsz > filesz. Which is why the code was structured so
weirdly.

Looking a little farther it looks like the binfmt_elf.c was introduced
in Linux v1.0, with essentially the same structure in load_elf_binary as
it has now. Prior to that Linux hard coded support for a.out binaries
in execve. So if someone wants to add a Fixes tag it should be
"Fixes: v1.0"

Which finally explains to me why the code is so odd. For the most part
the code has only received maintenance for the last 30 years or so.
Strictly 29 years, but 30 has a better ring to it.

Anyway those are my rambling thoughts that might help someone.
For now I will be happy if we can get my elf_load helper tested
to everyone's satisfaction and merged.

Eric