RE: [PATCH v2 01/12] x86/ioremap: Fix page aligned size calculation in __ioremap_caller()

From: Michael Kelley (LINUX)
Date: Fri Nov 11 2022 - 23:32:01 EST


From: Dave Hansen <dave.hansen@xxxxxxxxx> Sent: Friday, November 11, 2022 4:12 PM
>
> On 11/10/22 22:21, Michael Kelley wrote:
> > If applying the PHYSICAL_PAGE_MASK to the phys_addr argument causes
> > upper bits to be masked out, the re-calculation of size to account for
> > page alignment is incorrect because the same bits are not masked out
> > in last_addr.
> >
> > Fix this by masking the page aligned last_addr as well.
>
> This makes sense at first glance.
>
> How did you notice this? What is the impact to users? Did the bug
> actually cause you some trouble or was it by inspection? Do you have a
> sense of how many folks might be impacted? Any thoughts on how it
> lasted for 14+ years?
>
> For the functionality of the mapping, I guess 'size' doesn't really
> matter because even a 1-byte 'size' will map a page. The other fallout
> would be from memtype_reserve() reserving too little. But, that's
> unlikely to matter for small mappings because even though:
>
> ioremap(0x1800, 0x800);
>
> would end up just reserving 0x1000->0x1800, it still wouldn't allow
>
> ioremap(0x1000, 0x800);
>
> to succeed because *both* of them would end up trying to reserve the
> beginning of the page. Basically, the first caller effectively reserves
> the whole page and any second user will fail.
>
> So the other place it would matter would be for mappings that span two
> pages, say:
>
> ioremap(0x1fff, 0x2)
>
> But I guess those aren't very common. Most large ioremap() callers seem
> to already have base and size page-aligned.
>
> Anyway, sorry to make so much of a big deal about a one-liner. But,
> these decade-old bugs really make me wonder how they stuck around for so
> long.
>
> I'd be curious if you thought about this too while putting together this
> fox.

The bug only manifests if the phys_addr input argument exceeds
PHYSICAL_PAGE_MASK, which is the global variable physical_mask, which is
the size of the machine's or VM's physical address space. That's the only case
in which masking with PHYSICAL_PAGE_MASK changes anything. So I don't
see that your examples fit the situation. In the case where the masking does
clear some high order bits, the "size" calculation yields a huge number which
then quickly causes an error.

With that understanding, I'd guess that over the last 14 years, the bug has
never manifested, or if it did, it was due to something badly broken in the
caller. It's not clear why masking with PHYSICAL_PAGE_MASK is there in the
first place, other than as a "safety check" on the phys_addr input argument
that wasn't done quite correctly.

I hit the issue because this patch series does a *transition* in how the AMD
SNP "vTOM" bit is handled. vTOM is bit 46 in a 47-bit physical address space --
i.e., it's the high order bit. Current code treats the vTOM bit as part of the
physical address, and current code passes addresses with vTOM set into
__ioremap_caller() and everything works. But Patch 5 of this patch series
changes the underlying global variable physical_mask to remove bit 46,
similar to what tdx_early_init() does. At that point, passing __ioremap_caller()
a phys_addr with the vTOM bit set causes the bug and a failure. With the
fix, Patch 5 in this series causes __ioremap_caller() to mask out the vTOM bit,
which is what I want, at least temporarily.

Later patches in the series change things so that we no longer pass a
phys_addr to __ioremap_caller() that has the vTOM bit set. After those
later patches, this fix to __ioremap_caller() isn't needed. But I wanted to
avoid cramming all the vTOM-related changes into a single huge patch.
Having __ioremap_caller() correctly handle a phys_addr that exceeds
physical_mask instead of blowing up let's this patch series sequence things
into reasonable size chunks. And given that the __ioremap_caller() code is
wrong regardless, fixing it seemed like a reasonable overall solution.

Michael