Re: get_arg_page() && ptr_size accounting

From: Kees Cook
Date: Tue Sep 11 2018 - 00:23:23 EST


On Mon, Sep 10, 2018 at 10:18 AM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> On 09/10, Kees Cook wrote:
>>
>> > So get_arg_page() does
>> >
>> > /*
>> > * Since the stack will hold pointers to the strings, we
>> > * must account for them as well.
>> > *
>> > * The size calculation is the entire vma while each arg page is
>> > * built, so each time we get here it's calculating how far it
>> > * is currently (rather than each call being just the newly
>> > * added size from the arg page). As a result, we need to
>> > * always add the entire size of the pointers, so that on the
>> > * last call to get_arg_page() we'll actually have the entire
>> > * correct size.
>> > */
>> > ptr_size = (bprm->argc + bprm->envc) * sizeof(void *);
>> > if (ptr_size > ULONG_MAX - size)
>> > goto fail;
>> > size += ptr_size;
>> >
>> > OK, but
>> > acct_arg_size(bprm, size / PAGE_SIZE);
>> >
>> > after that doesn't look exactly right. This additional space will be used later
>> > when the process already uses bprm->mm, right? so it shouldn't be accounted by
>> > acct_arg_size().
>>
>> My understanding (based on the comment about acct_arg_size()) is that
>> before exec_mmap() happens, the memory used to build the new arguments
>> copy memory area gets accounted to the MM_ANONPAGES resource limit of
>> the execing process.
>
> Yes, because otherwise oom-killer can't account the memory populated by
> get_arg_page() in bprm->mm.
>
>> I couldn't find any place where the argc/envc
>> pointers were being included in the count,
>
> But why??? To clarify,
>
> size += ptr_size;
>
> after acct_arg_size() is clear and correct, we are going to check rlim_stack
> and thus the size should include the pointers we will add in create_elf_tables().
>
> But acct_arg_size() should only account the pages we allocate for bprm->mm,
> nothing more. create_elf_tables() does not allocate the memory when it populates
> arg_start/arg_end/env_start/env_end. Plus at this time the process has already
> switched to bprm->mm.

I've looked more closely now. So, while I agree with you about
resource limits, there's a corner case that is better handled here:
once we've called flush_old_exec(), we can no longer send errors back
to the parent. We just segfault. So, I think it's better to give a
resource limit error early, since it is able to do the math early.

If we move acct_arg_size() earlier, then the "immediate" resource
utilization is checked, but it means it can just segfault later. If we
leave it as-is, we account for later memory allocations "too early",
but we'll still not be able to run: but we can tell the parent why.

I prefer leave it as-is.

>> > Not to mention that ptr_size/PAGE_SIZE doesn't look right in any case...
>>
>> Hm? acct_arg_size() takes pages, not bytes. I think this is correct?
>> What doesn't look right to you?
>
> Please forget. I meant that _if_ we actually wanted to account this additional
> memory in bprm->pages, than we would probably need something like
> acct_arg_size(size/PAGE_SIZE + DIV_ROUND_UP(ptr_size, PAGE_SIZE)).

I'd need to study that more, but that change seems reasonable. :)

-Kees

--
Kees Cook
Pixel Security