Re: [RESEND PATCH] uapi: increase MAX_ARG_STRLEN from 128K to 6M

From: Kees Cook
Date: Tue Oct 17 2023 - 13:00:05 EST


On Mon, Oct 16, 2023 at 11:33:18PM +0100, Sergei Trofimovich wrote:
> On Mon, 16 Oct 2023 14:50:05 -0700
> Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>
> > On Mon, Oct 16, 2023 at 10:25:07PM +0100, Sergei Trofimovich wrote:
> > > Before the change linux allowed individual execve() arguments or
> > > environment variable entries to be only as big as 32 pages.
> > >
> > > Histroically before b6a2fea3931 "mm: variable length argument support"
> > > MAX_ARG_STRLEN used to be full allowed size `argv[] + envp[]`.
> > >
> > > When full limit was abandoned individual parameters were still limited
> > > by a safe limit of 128K.
> > >
> > > Nowadays' linux allows `argv[]+envp[]` to be as laerge as 6MB (3/4
> > > `_STK_LIM`).
> >
> > See bprm_stack_limits() for the details, but yes, 3/4 _STK_LIM tends to
> > be the max, unless the rlim_stack is set smaller.
> >
> > > Some build systems like `autoconf` use a single environment variable
> > > to pass `CFLAGS` environment variable around. It's not a bug problem
> > > if the argument list is short.
> > >
> > > But some packaging systems prefer installing each package into
> > > individual directory. As a result that requires quite long string of
> > > parameters like:
> > >
> > > CFLAGS="-I/path/to/pkg1 -I/path/to/pkg2 ..."
> > >
> > > This can easily overflow 128K and does happen for `NixOS` and `nixpkgs`
> > > repositories on a regular basis.
> >
> > That's ... alarming. What are you doing currently to work around this?
>
> We usually try to stay under the threshold. When the problem arises due
> to organic growth of inputs over time we either suffer build failures
> without any reasonable workarounds or if the change was recent and
> inflated command line options we revert the change and add hacks into
> other places (like patching `gcc` directly to apply the
> transformations).
>
> Longer term it would be nice for `gcc` to allow unbounded output via
> response files, but it will take some time to sort out as current flags
> rewriting expands all flags and response files into a single huge
> variable and hits the 128K limit:
>
> https://gcc.gnu.org/pipermail/gcc/2023-October/242641.html
>
> Medium term dropping kernel's arbitrary small limit (this change) sounds
> like the simplest solution.
>
> > >
> > > Similar pattern is exhibited by `gcc` which converts it's input command
> > > line into a single environment variable (https://gcc.gnu.org/PR111527):
> > >
> > > $ big_100k_var=$(printf "%0*d" 100000 0)
> > >
> > > # this works: 200KB of options for `printf` external command
> > > $ $(which printf) "%s %s" $big_100k_var $big_100k_var >/dev/null; echo $?
> > > 0
> > >
> > > # this fails: 200KB of options for `gcc`, fails in `cc1`
> > > $ touch a.c; gcc -c a.c -DA=$big_100k_var -DB=$big_100k_var
> > > gcc: fatal error: cannot execute 'cc1': execv: Argument list too long
> > > compilation terminated.
> > >
> > > I would say this 128KB limitation is arbitrary.
> > > The change raises the limit of `MAX_ARG_STRLEN` from 32 pakes (128K
> > > n `x86_64`) to the maximum limit of stack allowed by Linux today.
> > >
> > > It has a minor chance of overflowing userspace programs that use
> > > `MAX_ARG_STRLEN` to allocate the strings on stack. It should not be a
> > > big problem as such programs are already are at risk of overflowing
> > > stack.
> > >
> > > Tested as:
> > > $ V=$(printf "%*d" 1000000 0) ls
> > >
> > > Before the change it failed with `ls: Argument list too long`. After the
> > > change `ls` executes as expected.
> > >
> > > WDYT of abandoning the limit and allow user to fill entire environment
> > > with a single command or a single variable?
> >
> > Changing this value shouldn't risk any vma collisions, since exec is
> > still checking the utilization before starting the actual process
> > replacement steps (see bprm->argmin).
> >
> > It does seem pathological to set this to the full 6MB, though, since
> > that would _immediately_ get rejected by execve() with an -E2BIG, but
> > ultimately, there isn't really any specific limit to the length of
> > individual strings as long as the entire space is less than
> > bprm->argmin.
> >
> > Perhaps MAX_ARG_STRLEN should be removed entirely and the kernel can
> > just use bprm->argmin? I haven't really looked at that to see if it's
> > sane, though. It just feels like MAX_ARG_STRLEN isn't a meaningful
> > limit.
>
> Removing the limit entirely in favour of 'bprm->argmin' sounds good.
> I'll try to make it so and will send v2.
>
> What should be the fate of userspace-exported `MAX_ARG_STRLEN` value in
> that case? Should it stay `(PAGE_SIZE * 32)`? Should it be removed
> entirely? (a chance of user space build failures).
>
> If we are to remove it entirely what should be the reasonable limit in
> kernel for other subsystems that still use it?
>
> fs/binfmt_elf.c: len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
> fs/binfmt_elf_fdpic.c: len = strnlen_user(p, MAX_ARG_STRLEN);
> fs/binfmt_flat.c: len = strnlen_user(p, MAX_ARG_STRLEN);
> kernel/auditsc.c: len_full = strnlen_user(p, MAX_ARG_STRLEN) - 1;
>
> Keeping it at an "arbitrary" 6MB limit looked safe :)

Yeah, I think leaving MAX_ARG_STRLEN totally unchanged but adjust where
it is used only for the ELF loader is probably the least risky thing to
do here.

-Kees

>
> > -Kees
> >
> > >
> > > CC: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> > > CC: Eric Biederman <ebiederm@xxxxxxxxxxxx>
> > > CC: Kees Cook <keescook@xxxxxxxxxxxx>
> > > CC: linux-mm@xxxxxxxxx
> > > CC: linux-kernel@xxxxxxxxxxxxxxx
> > > Signed-off-by: Sergei Trofimovich <slyich@xxxxxxxxx>
> > > ---
> > > include/uapi/linux/binfmts.h | 6 +++---
> > > 1 file changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/include/uapi/linux/binfmts.h b/include/uapi/linux/binfmts.h
> > > index c6f9450efc12..4e828515a22e 100644
> > > --- a/include/uapi/linux/binfmts.h
> > > +++ b/include/uapi/linux/binfmts.h
> > > @@ -8,11 +8,11 @@ struct pt_regs;
> > >
> > > /*
> > > * These are the maximum length and maximum number of strings passed to the
> > > - * execve() system call. MAX_ARG_STRLEN is essentially random but serves to
> > > - * prevent the kernel from being unduly impacted by misaddressed pointers.
> > > + * execve() system call. MAX_ARG_STRLEN is as large as Linux allows new
> > > + * stack to grow. Currently it's `_STK_LIM / 4 * 3 = 6MB` (see fs/exec.c).
> > > * MAX_ARG_STRINGS is chosen to fit in a signed 32-bit integer.
> > > */
> > > -#define MAX_ARG_STRLEN (PAGE_SIZE * 32)
> > > +#define MAX_ARG_STRLEN (6 * 1024 * 1024)
> > > #define MAX_ARG_STRINGS 0x7FFFFFFF
> > >
> > > /* sizeof(linux_binprm->buf) */
> > > --
> > > 2.42.0
> > >
> >
> > --
> > Kees Cook
>
>
> --
>
> Sergei

--
Kees Cook