Re: [PATCH] binfmt_elf: Support segments with 0 filesz and misaligned starts

From: Kees Cook
Date: Tue Sep 26 2023 - 23:17:02 EST


On Mon, Sep 25, 2023 at 10:27:02PM -0500, Eric W. Biederman wrote:
> Kees Cook <keescook@xxxxxxxxxxxx> writes:
>
> > On Mon, Sep 25, 2023 at 05:27:12PM +0200, Sebastian Ott wrote:
> >> On Mon, 25 Sep 2023, Eric W. Biederman wrote:
> >> >
> >> > Implement a helper elf_load that wraps elf_map and performs all
> >> > of the necessary work to ensure that when "memsz > filesz"
> >> > the bytes described by "memsz > filesz" are zeroed.
> >> >
> >> > Link: https://lkml.kernel.org/r/20230914-bss-alloc-v1-1-78de67d2c6dd@xxxxxxxxxxxxxx
> >> > Reported-by: Sebastian Ott <sebott@xxxxxxxxxx>
> >> > Reported-by: Thomas Weißschuh <linux@xxxxxxxxxxxxxx>
> >> > Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
> >> > ---
> >> > fs/binfmt_elf.c | 111 +++++++++++++++++++++---------------------------
> >> > 1 file changed, 48 insertions(+), 63 deletions(-)
> >> >
> >> > Can you please test this one?
> >
> > Eric thanks for doing this refactoring! This does look similar to the
> > earlier attempt:
> > https://lore.kernel.org/lkml/20221106021657.1145519-1-pedro.falcato@xxxxxxxxx/
> > and it's a bit easier to review.
>
> I need to context switch away for a while so Kees if you will
> I will let you handle the rest of this one.
>
>
> A couple of thoughts running through my head for anyone whose ambitious
> might include cleaning up binfmt_elf.c
>
> The elf_bss variable in load_elf_binary can be removed.
>
> Work for a follow on patch is using my new elf_load in load_elf_interp
> and possibly in load_elf_library. (More code size reduction).
>
> An outstanding issue is if the first segment has filesz 0, and has a
> randomized locations. But that is the same as today.
>
> There is a whole question does it make sense for the elf loader
> to have it's own helper vm_brk_flags in mm/mmap.c or would it
> make more sense for binfmt_elf to do what binfmt_elf_fdpic does and
> have everything to go through vm_mmap.
>
> I think replacing vm_brk_flags with vm_mmap would allow fixing the
> theoretical issue of filesz 0 and randomizing locations.
>
>
>
> In this change I replaced an open coded padzero that did not clear
> all of the way to the end of the page, with padzero that does.

Yeah, the resulting code is way more readable now.

> I also stopped checking the return of padzero as there is at least
> one known case where testing for failure is the wrong thing to do.
> It looks like binfmt_elf_fdpic may have the proper set of tests
> for when error handling can be safely completed.
>
> I found a couple of commits in the old history
> https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git,
> that look very interesting in understanding this code.
>
> commit 39b56d902bf3 ("[PATCH] binfmt_elf: clearing bss may fail")
> commit c6e2227e4a3e ("[SPARC64]: Missing user access return value checks in fs/binfmt_elf.c and fs/compat.c")
> commit 5bf3be033f50 ("v2.4.10.1 -> v2.4.10.2")
>
> Looking at commit 39b56d902bf3 ("[PATCH] binfmt_elf: clearing bss may fail"):
> > commit 39b56d902bf35241e7cba6cc30b828ed937175ad
> > Author: Pavel Machek <pavel@xxxxxx>
> > Date: Wed Feb 9 22:40:30 2005 -0800
> >
> > [PATCH] binfmt_elf: clearing bss may fail
> >
> > So we discover that Borland's Kylix application builder emits weird elf
> > files which describe a non-writeable bss segment.
> >
> > So remove the clear_user() check at the place where we zero out the bss. I
> > don't _think_ there are any security implications here (plus we've never
> > checked that clear_user() return value, so whoops if it is a problem).
> >
> > Signed-off-by: Pavel Machek <pavel@xxxxxxx>
> > Signed-off-by: Andrew Morton <akpm@xxxxxxxx>
> > Signed-off-by: Linus Torvalds <torvalds@xxxxxxxx>
>
> It seems pretty clear that binfmt_elf_fdpic with skipping clear_user
> for non-writable segments and otherwise calling clear_user (aka padzero)
> and checking it's return code is the right thing to do.
>
> I just skipped the error checking as that avoids breaking things.
>
> It looks like Borland's Kylix died in 2005 so it might be safe to
> just consider read-only segments with memsz > filesz an error.

I really feel like having a read-only BSS is a pathological state that
should be detected early?

> Looking at commit 5bf3be033f50 ("v2.4.10.1 -> v2.4.10.2") the
> binfmt_elf.c bits confirm my guess that the weird structure is because
> before that point binfmt_elf.c assumed there would be only a single
> segment with memsz > filesz. Which is why the code was structured so
> weirdly.

Agreed.

> Looking a little farther it looks like the binfmt_elf.c was introduced
> in Linux v1.0, with essentially the same structure in load_elf_binary as
> it has now. Prior to that Linux hard coded support for a.out binaries
> in execve. So if someone wants to add a Fixes tag it should be
> "Fixes: v1.0"
>
> Which finally explains to me why the code is so odd. For the most part
> the code has only received maintenance for the last 30 years or so.
> Strictly 29 years, but 30 has a better ring to it.
>
> Anyway those are my rambling thoughts that might help someone.
> For now I will be happy if we can get my elf_load helper tested
> to everyone's satisfaction and merged.

I'm probably going to pull most of this email into the commit log for
the v2 patch -- there's good history here worth capturing.

--
Kees Cook