Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code

From: Toshi Kani
Date: Tue Mar 15 2016 - 18:56:23 EST


On Tue, 2016-03-15 at 01:15 +0100, Luis R. Rodriguez wrote:
> On Fri, Mar 11, 2016 at 06:16:36PM -0700, Toshi Kani wrote:
> > On Fri, 2016-03-11 at 15:34 -0800, Luis R. Rodriguez wrote:
> > > On Fri, Mar 11, 2016 at 3:56 PM, Toshi Kani <toshi.kani@xxxxxxx>
> > > wrote:
> > > > On Fri, 2016-03-11 at 23:17 +0100, Luis R. Rodriguez wrote:
> > > > > On Fri, Mar 11, 2016 at 11:57:12AM -0700, Toshi Kani wrote:
Â:
> > > > > What I'm after is seeing if we can ultimately disable MTRR on
> > > > > kernel code but still have PAT enabled. I realize you've
> > > > > mentioned BIOS code may use some MTRR setup code but this is only
> > > > > true for some systems. I know for a fact Xen cannot use MTRR, it
> > > > > seems qemu32 does not enable it either. So why not have the
> > > > > ability to skip through its set up ?
> > > >
> > > > MTRR support has two meanings:
> > > > Â1) The kernel keeps the MTRR setup by BIOS.
> > > > Â2) The kernel modifies the MTRR setup.
> > > >
> > > > I am in a position that we need 1) but 2).
> > >
> > > I take it you meant "but not 2)" ?Â
> >
> > Yes. :)
>
> OK -- we are in agreement but we know 1) is only needed for a portion of
> systems: Xen and qemu32 systems fly with no MTRR set up, and as such it
> would be incorrect to run MTRR code on such systems. To these systems
> MTRR functionality code should be dead, since PAT currently depends on
> MTRR PAT should also be dead but as the report you're fixing shows
> it wasn't. That's an issue for qemu that uses the regular x86 init path
> but not for Xen. Its different for Xen as the hypervisor is the one that
> set up the MSR_IA32_CR_PAT for each CPU. The *only* thing Xen does is:

MTRR code does not have to be dead for the virtual machines with no MTRR
support. ÂThe code just needs to handle the disabled case properly. ÂI
consider this is part of 1) that the kernel keeps the MTRR state as
disabled.

> void xen_start_kernel(void)
> {
> ...
> ÂÂÂÂÂÂÂÂrdmsrl(MSR_IA32_CR_PAT,Âpat);ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
> ÂÂÂÂÂÂÂ
> ÂÂÂÂÂÂÂÂpat_init_cache_modes(pat);Â
> ...
> }
>
> Fortunately we only have to call pat_init_cache_modes() once, its
> not per-CPU. Xen has shown then that you *can* live with PAT without
> any of the complex MTRR setup / code. Please add to your table the
> Xen case as well then as its important to consider. If you make it
> a strong requirement to have MTRR enabled to enable PAT you'd
> be disabling PAT on Xen guest boots.

Yes, I understand that the original fix will break Xen. ÂThanks for
pointing this out! ÂIn the next version (which is the same approach as the
two additional patches I sent you yesterday), I am going to integrate the
Xen's use-case into the framework. Âxen_start_kernel() will no longer need
to callÂpat_init_cache_modes() as a result. ÂSo, please review the next
version, and let me know if there is any issue.

> As-is then your this patch which calls pat_disable() on mtrr_bp_init()
> for the case where MTRR is disabled would essentially break PAT on Xen
> guests, so this cannot be done. It is no longer true that if MTRR is
> disabled you can force disable PAT. To do what you want you want to do we
> have to consider Xen.
>
> I don't think its a good idea to keep PAT initialization meshed together
> with MTRR and making it a strong requirement on enabling PAT. The MTRR
> code is extremely complex. I'd like instead to encourage for us to
> consider for this situation to let PAT become a first class citizen,
> if MTRR is disabled but you've enabled PAT you should be able to use
> it, just as Xen does. There are more reasons to enable such setup than
> not to. Long term I'm advocating to see if we can get an ACPI legacy
> MTRR flag that can tell us if the BIOS has MTRR ripped out, then we
> can at run time also take advantage of ignoring PAT completely as well.
>
> Note, if you insisted you didn't want to disable PAT on Xen, you could
> in theory check for the subarch -- but note that the subarch is unused
> yet on Xen, even though it was added to the x86 boot protocol years ago.
> I have a slew of patches to make use of it to help put paravirt_enabled()
> in the grave, but based on discussions with Ingo, we don't want to spread
> use of the subarch in random x86 code paths, we want to compartamentalize
> that. If you still want to follow your approach of just force-disabling
> PAT on MTRR code if MTRR was disabled you'd have to use semantics to
> figure out if the boot path came from Xen, to be more specific for Xen
> PV guest types only... The current agreed approach to avoid directly
> using subarch is to categorize differences between what some guests
> need and bare metal under an x86 platform quirk and legacy set of
> components.
>
> On the x86 init path we'd call something check for the subarch and based
> on that set a series of x86 legacy features / quirks that need to be
> disabled / enabled. We could add MTRR as one. I'm unifying some of this
> with a bit of what goes into the ACPI IA-PC boot architecture, see
> section 5.2.9.3 IA-PC Boot Architecture Flags [0]. In particular the
> paravirt_enabled() series I'm working on happens to also dabble into
> the no CMOS RTC case for Xen, generalizing this knocks a bit of birds
> with one stone. I think we can do the same with MTRR but also be
> proactive and see if we can get ACPI_FADT_NO_MTRR added as well for
> a future ACPI spec to enable BIOS manufacturers to rip MTRR out.

We do not need subarch for this case since presence of the MTRR feature can
be tested with CPUID and MSR.

> [0] http://www.acpi.info/DOWNLOADS/ACPIspec50.pdfÂ;
>
> /* Masks for FADT IA-PC Boot Architecture Flags (boot_flags)
> [Vx]=Introduced in this FADT revision */
> #define ACPI_FADT_LEGACY_DEVICESÂÂÂÂ(1)ÂÂÂÂÂÂÂÂÂ/* 00: [V2] System has
> LPC or ISA bus devices */
> #define ACPI_FADT_8042ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ(1<<1)ÂÂÂÂÂÂ/* 01: [V3] System has an
> 8042 controller on port 60/64 */
> #define ACPI_FADT_NO_VGAÂÂÂÂÂÂÂÂÂÂÂÂ(1<<2)ÂÂÂÂÂÂ/* 02: [V4] It is not
> safe to probe for VGA hardware */
> #define ACPI_FADT_NO_MSIÂÂÂÂÂÂÂÂÂÂÂÂ(1<<3)ÂÂÂÂÂÂ/* 03: [V4] Message
> Signaled Interrupts (MSI) must not be enabled */
> #define ACPI_FADT_NO_ASPMÂÂÂÂÂÂÂÂÂÂÂ(1<<4)ÂÂÂÂÂÂ/* 04: [V4] PCIe ASPM
> control must not be enabled */
> #define ACPI_FADT_NO_CMOS_RTCÂÂÂÂÂÂÂ(1<<5)ÂÂÂÂÂÂ/* 05: [V5] No CMOS real-
> time clock present */
>
> > > There *are folks however who do
> > > more as I noted earlier. Perhaps not now, but in the future I'd
> > > encourage folks to rip MTRR out of their own BIOS, and enable a new
> > > ACPI legacy flag to say "MTRR required". That'd eventually can help
> > > bury MTRR for good while remaining backward compatible.
> >
> > Well, BIOS using MTRR is better than BIOS setting page tables in the
> > SMI handler.
>
> Can some BIOSes be developed without MTRR? For instance I suspect Google
> might be able to easily pull of ripping MTRR out of their BIOS if they
> didn't do it already for the ChromeOS devices. If possible not only
> should it help with removing complexity on the BIOS but not even having
> to think about that code *ever* running on the kernel at all should be
> nice.

AFAIK, MTRR is the only way to specify UC attribute in physical mode on
x86. ÂOn ia64, one can simply set the UC bit to a physical address to
specify UC attribute. ÂI wish we had something similar.

> > The kernel can be ignorant of the MTRR setup as long as it does
> > not modify it.
>
> Sure, we're already there. The kernel no longer modifies the
> MTRR setup unless of course you boot without PAT enabled. I think
> we need to move beyond that to ACPI if we can to let regular
> Linux boots never have to deal with MTRR at all. The code is
> complex and nasty why not put let folks put a nail on the coffin for
> good?

I think we are good as long as we do not modify it. ÂThe complexity comes
with the modification.

> > > I can read the above description to also say:
> > >
> > > "Hey you need to implement PAT with the same skeleton code as MTRR"
> >
> > No, I did not say that. ÂMTRR's rendezvous handler can be generalized
> > to work with both MTRR and PAT. ÂWe do not need two separate handlers.
> > ÂIn fact, it needs to be a single handler so that both can be
> > initialized together.
>
> I'm not sure if that's really needed. Doesn't PAT just require setting
> the wrmsrl(MSR_IA32_CR_PAT, pat) for each AP?

No, it requires the same procedure as MTRR.

> > > If we do that, we can pave the way to deprecate MTRR as legacy for
> > > good first on Linux.
> >
> > I do not think such change will deprecate MTRR.
>
> Not even for shiny new BIOSes? Post ACPI 5?

Nope.

> > ÂIt just means that Linux can enable PAT on virtual CPUs with PAT &
> > !MTRR capability.
> >
> > > > In fact, the kernel disabling MTRRs is the same as 2).
> > > >
> > > > > I'll also note Xen managed to enable PAT only without enabling
> > > > > MTRR, this was done through pat_init_cache_modes() -- not sure if
> > > > > this can be leveraged for qemu32...
> > > >
> > > > I am interested to know how Xen managed this.ÂÂIs this done by the
> > > > Xen hypervisor initializes guest's PAT on behalf of the guest
> > > > kernel?
> > >
> > > Yup. And the cache read thingy was reading back its own setup, which
> > > was different than what Linux used by default IIRC. Juergen can
> > > elaborate more.
> >
> > Yeah, I'd like to make sure that my changes won't break it.
>
> I checked through code inspection and indeed, it seems it would break
> Xen's PAT setup.
>
> For the record: the issue here was code that should not run ran, that
> is dead code ran. I'm working towards a generic solution for this.

Please review the next version.

Thanks,
-Toshi