Re: [PATCH 1/3] acpi/processor: fix evaluating _PDC method when running as Xen dom0

From: Roger Pau Monné
Date: Fri Dec 02 2022 - 07:24:52 EST


On Wed, Nov 30, 2022 at 08:48:14AM -0800, Dave Hansen wrote:
> On 11/30/22 07:53, Roger Pau Monné wrote:
> > On Tue, Nov 29, 2022 at 09:43:53AM -0800, Dave Hansen wrote:
> >> On 11/21/22 02:21, Roger Pau Monne wrote:
> >>> When running as a Xen dom0 the number of CPUs available to Linux can
> >>> be different from the number of CPUs present on the system, but in
> >>> order to properly fetch processor performance related data _PDC must
> >>> be executed on all the physical CPUs online on the system.
> >>
> >> How is the number of CPUs available to Linux different?
> >>
> >> Is this a result of the ACPI tables that dom0 sees being "wrong"?
> >
> > Depends on the mode. This is all specific to Linux running as a Xen
> > dom0.
> >
> > For PV dom0 the ACPI tables that dom0 sees are the native ones,
> > however available CPUs are not detected based on the MADT, but using
> > hypercalls, see xen_smp_ops struct and the
> > x86_init.mpparse.get_smp_config hook used in smp_pv.c
> > (_get_smp_config()).
> >
> > For a PVH dom0 Xen provides dom0 with a crafted MADT table that does
> > only contain the CPUs available to dom0, and hence is likely different
> > from the native one present on the hardware.
> >
> > In any case, the dynamic tables dom0 sees where the Processor
> > objects/devices reside are not modified by Xen in any way, so the ACPI
> > Processors are always exposed to dom0 as present on the native
> > tables.
> >
> > Xen cannot parse the dynamic ACPI tables (neither should it, since
> > then it would act as OSPM), so it relies on dom0 to provide same data
> > present on those tables for Xen to properly manage the frequency and
> > idle states of the CPUs on the system.
> >
> >>> The current checks in processor_physically_present() result in some
> >>> processor objects not getting their _PDC methods evaluated when Linux
> >>> is running as Xen dom0. Fix this by introducing a custom function to
> >>> use when running as Xen dom0 in order to check whether a processor
> >>> object matches a CPU that's online.
> >>
> >> What is the end user visible effect of this problem and of the solution?
> >
> > Without this fix _PDC is only evaluated for the CPUs online from dom0
> > point of view, which means that if dom0 is limited to 8 CPUs but the
> > system has 24 CPUs, _PDC will only get evaluated for 8 CPUs, and that
> > can have the side effect of the data then returned by _PSD method or
> > other methods being different between CPUs where _PDC was evaluated vs
> > CPUs where the method wasn't evaluated. Such mismatches can
> > ultimately lead to for example the CPU frequency driver in Xen not
> > initializing properly because the coordination methods between CPUs on
> > the same domain don't match.
> >
> > Also not evaluating _PDC prevents the OS (or Xen in this case)
> > from notifying ACPI of the features it supports.
> >
> > IOW this fix attempts to make sure all physically online CPUs get _PDC
> > evaluated, and in order to to that we need to ask the hypervisor if a
> > Processor ACPI ID matches an online CPU or not, because Linux doesn't
> > have that information when running as dom0.
> >
> > Hope the above makes sense and allows to make some progress on the
> > issue, sometimes it's hard to summarize without getting too
> > specific,
>
> Yes, writing changelogs is hard. :)
>
> Let's try though. I was missing some key pieces of background here.
> Believe it or not, I had no idea off the top of my head what _PDC was or
> why it's important.
>
> the information about _PDC being required on all processors was missing,
> as was the information about the dom0's incomplete concept of the
> available physical processors.
>
> == Background ==
>
> In ACPI systems, the OS can direct power management, as opposed to the
> firmware. This OS-directed Power Management is called OSPM. Part of
> telling the firmware that the OS going to direct power management is
> making ACPI "_PDC" (Processor Driver Capabilities) calls. These _PDC
> calls must be made on every processor. If these _PDC calls are not
> completed on every processor it can lead to inconsistency and later
> failures in things like the CPU frequency driver.

I think the "on every processor" is not fully accurate. _PDC methods
need to be evaluated for every Processor object. Whether that
evaluation is executed on the physical processor that matches the ACPI
UID of the object/device is not mandatory (iow: you can evaluate
the _PDC methods of all Processor objects from the BSP). The usage of
'on' seems to me to note that the methods are executed on the matching
physical processors.

I would instead use: "... must be made for every processor. If these
_PDC calls are not completed for every processor..."

But I'm not a native English speaker, so this might all be irrelevant.

>
> In a Xen system, the dom0 kernel is responsible for system-wide power
> management. The dom0 kernel is in charge of OSPM. However, the Xen
> hypervisor hides some processors information from the dom0 kernel. This
> is presumably done to ensure that the dom0 system has less interference
> with guests that want to use the other processors.

dom0 on a Xen system is just another guest, so the admin can limit the
number of CPUs available to dom0, that's why we get into this weird
situation.

> == Problem ==
>
> But, this leads to a problem: the dom0 kernel needs to run _PDC on all
> the processors, but it can't always see them.
>
> == Solution ==
>
> In dom0 kernels, ignore the existing ACPI method for determining if a
> processor is physically present because it might not be accurate.
> Instead, ask the hypervisor for this information.
>
> This ensures that ...
>
> ----
>
> Is that about right?

Yes, I think it's accurate. I will add to my commit log, thanks!

On the implementation side, is the proposed approach acceptable?
Mostly asking because it adds Xen conditionals to otherwise generic
ACPI code.

Thanks, Roger.