Re: A logical error in arch/x86/mm/init.c

From: Nikita Popov
Date: Fri Feb 04 2022 - 00:36:05 EST


Thank you for your attention.
> If you really feel that this is something that needs to be fixed, I'd
> appreciate if you could find some way to reproduce it and then send a
> proper patch.
I believe this would be hard to reproduce.
I just noticed this discrepancy during manual code review.
I'm considering the following facts:
1) The area 'pgt_buf' is part of the 'brk' area defined in the linker
script. It is allocated in the function 'early_alloc_pgt_buf' using
the very same 'extend_brk'. The latter is essentially a stack-based
allocator picking its memory slices from the linker defined area.
2) The allocations from 'pgt_buf' are in the stack manner too.
One can expect that these two areas (one of which is completely
contained in the other) have the same properties in view of the direct
memory mapping.

Then there is the flag 'can_use_brk_pgt' which allows usage of the
pgt_buf area if a mapped range doesn't overlap with the free space of
the pgt_buf area. In the 'init_range_memory_mapping' function we can
observe that this flag doesn't reflect the relative position between a
mapped range and the free space of the brk area as a whole:
/*
* if it is overlapping with brk pgt, we need to
* alloc pgt buf from memblock instead.
*/
can_use_brk_pgt = max(start, (u64)pgt_buf_end<<PAGE_SHIFT) >=
min(end, (u64)pgt_buf_top<<PAGE_SHIFT);
This check is simply too narrow.

So for whatever reason this flag prohibits usage of the pgt_buf area,
I believe for the exact same reason we have to avoid using brk area if
the similar condition on the free space of the brk area holds.
> This _might_ be right. But, my confidence that it won't break anything
> else is pretty low. It's also obviously not been tested.
Yes, I agree here. I saw it as my duty to report the possible issue.
> What are these "MMU issues"?
I tried to deduce the underlying reason beyond the code fragments in
question. I presumed that checking for overlap is protecting against
some MMU issues that could affect stability of the kernel.

Kind regards,
Nikita Popov
Senior C Developer

On Thu, Feb 3, 2022 at 10:27 PM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>
> On 2/3/22 02:30, Nikita Popov wrote:
> > It appears that there is a logical error in arch/x86/mm/init.c in the
> > master branch. Although it is unlikely to occur in practice. I
> > discovered it while studying source code in that file.
>
> I looked at this a bit. It seems to have originated in:
>
> > https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=c9b3234a6aba
>
> which isn't the best changelog in the history of the world. It's also
> fixing a boot problem in a configuration that I can't readily reproduce
> (Xen PV guest).
>
> There's one thing from the old changelog that's confusing me:
>
> > But after we get back that page for pgt, it tiggers one bug in pgt allocation
> > with xen: We need to avoid to use page as pgt to map range that is
> > overlapping with that pgt page.
>
> and presumably alluding to the same issue from your mail:
>
> > ... which can incur MMU issues if that page is allocated as a page
> > directory)
>
> What are these "MMU issues"?
>
> > In my opinion one of the simplest fixes here is to completely remove
> > the following lines:
> > if (!ret && can_use_brk_pgt)
> > ret = __pa(extend_brk(PAGE_SIZE * num, PAGE_SIZE));
>
> This _might_ be right. But, my confidence that it won't break anything
> else is pretty low. It's also obviously not been tested.
>
> I'd be much more confident if this issue was reproduced, even if the
> reproduction was contrived by doing something like purposefully
> exhausting the pgt_buf_* area.
>
> If you really feel that this is something that needs to be fixed, I'd
> appreciate if you could find some way to reproduce it and then send a
> proper patch.