Re: [PATCH v2 4/4] thermal: intel: hfi: Add a suspend notifier

From: Ricardo Neri
Date: Wed Jan 03 2024 - 22:01:36 EST


On Wed, Jan 03, 2024 at 02:38:26PM +0100, Rafael J. Wysocki wrote:
> On Wed, Jan 3, 2024 at 2:34 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
> >
> > The subject should say "add a PM notifier" to indicate that
> > hibernation is covered too.
> >
> > On Wed, Jan 3, 2024 at 5:13 AM Ricardo Neri
> > <ricardo.neri-calderon@xxxxxxxxxxxxxxx> wrote:
> > >
> > > The kernel gives the HFI hardware a memory region that the latter uses to
> > > provide updates to the HFI table. The kernel allocates this memory region
> > > at boot. It remains constant throughout runtime time.
> > >
> > > When resuming from suspend or hibernation, the restore kernel allocates a
> >
> > The restore kernel is only used during resume from hibernation, so
> > this particular problem is hibernation-specific.
> >
> > It is possible, at least in principle, that the address of the HFI
> > table is "lost" by the processor during resume from "deep" suspend
> > (ACPI S3), in which case it may not survive the firmware-driven part
> > of the suspend-resume cycle. It is thus prudent to disable HFI on
> > suspend and re-enable it on resume for the boot CPU (under the
> > assumption that the other CPUs will be taken care of by CPU offline),
> > but for a somewhat different reason than in the hibernation case.
> >
> > > second memory buffer and reprograms the HFI hardware with the new location
> > > as part of a normal boot. The location of the second memory buffer may
> > > differ from the one allocated by the image kernel. Subsequently, when the
> > > restore kernel transfers control to the image kernel, the second buffer
> > > becomes invalid, potentially leading to memory corruption if the hardware
> > > writes to it (hardware continues using the buffer from the restore kernel).
> > >
> > > Add a suspend notifier to disable all HFI instances before jumping to the
> > > image kernel and enable them once the image kernel has been restored. Use
> > > the memory buffer that the image kernel allocated.
> > >
> > > For non-boot CPUs, rely on the CPU hotplug callbacks as CPUs are disabled
> > > and enabled during suspend and resume, respectively.
> > >
> > > The CPU hotplug callbacks do not cover the boot CPU. Handle the HFI
> > > instance of the boot CPU from the suspend notifier callback.
> > >
> > > Cc: Chen Yu <yu.c.chen@xxxxxxxxx>
> > > Cc: Len Brown <len.brown@xxxxxxxxx>
> > > Cc: Srinivas Pandruvada <srinivas.pandruvada@xxxxxxxxxxxxxxx>
> > > Cc: Stanislaw Gruszka <stanislaw.gruszka@xxxxxxxxxxxxxxx>
> > > Cc: Zhang Rui <rui.zhang@xxxxxxxxx>
> > > Cc: Zhao Liu <zhao1.liu@xxxxxxxxxxxxxxx>
> > > Cc: linux-pm@xxxxxxxxxxxxxxx
> > > Cc: stable@xxxxxxxxxxxxxxx # 6.1
> > > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@xxxxxxxxxxxxxxx>
> > > --
> > > Changes since v1:
> > > * Moved registration of the suspend notifier towards the end of
> > > intel_hfi_init(). (Stan)
> > > * Renamed hfi_do_pm_[enable|disable]() to hfi_do_[enable|disable](). Stan
> > > will use these functions outside the suspend notifier. (Stan)
> > > * Added locking to calls to hfi_[enable|disable]() from the suspend
> > > notifier. (Rafael)
> > > ---
> > > drivers/thermal/intel/intel_hfi.c | 62 +++++++++++++++++++++++++++++++
> > > 1 file changed, 62 insertions(+)
> > >
> > > diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
> > > index 22445403b520..8d6e4f8dc67a 100644
> > > --- a/drivers/thermal/intel/intel_hfi.c
> > > +++ b/drivers/thermal/intel/intel_hfi.c
> > > @@ -30,11 +30,13 @@
> > > #include <linux/kernel.h>
> > > #include <linux/math.h>
> > > #include <linux/mutex.h>
> > > +#include <linux/notifier.h>
> > > #include <linux/percpu-defs.h>
> > > #include <linux/printk.h>
> > > #include <linux/processor.h>
> > > #include <linux/slab.h>
> > > #include <linux/spinlock.h>
> > > +#include <linux/suspend.h>
> > > #include <linux/string.h>
> > > #include <linux/topology.h>
> > > #include <linux/workqueue.h>
> > > @@ -571,6 +573,60 @@ static __init int hfi_parse_features(void)
> > > return 0;
> > > }
> > >
> > > +static void hfi_do_enable(void *info)
> > > +{
> > > + struct hfi_instance *hfi_instance = info;
> > > +
> > > + hfi_set_hw_table(hfi_instance);
> > > + hfi_enable();
> > > +}
> > > +
> > > +static void hfi_do_disable(void *info)
> > > +{
> > > + hfi_disable();
> > > +}
> > > +
> > > +static int hfi_pm_notify(struct notifier_block *nb,
> > > + unsigned long mode, void *unused)
> > > +{
> > > + struct hfi_cpu_info *info = &per_cpu(hfi_cpu_info, 0);
> > > + struct hfi_instance *hfi = info->hfi_instance;
> > > + int ret = 0;
> > > +
> > > + /* HFI may not be in use. */
> > > + if (!hfi)
> > > + return ret;
> > > +
> > > + mutex_lock(&hfi_instance_lock);
> > > + /*
> > > + * Only handle the HFI instance of the package of the boot CPU. The
> > > + * instances of other packages are handled in the CPU hotplug callbacks.
> > > + */
> > > + switch (mode) {
> > > + case PM_HIBERNATION_PREPARE:
> > > + case PM_SUSPEND_PREPARE:
> > > + case PM_RESTORE_PREPARE:
> > > + ret = smp_call_function_single(0, hfi_do_disable, NULL, true);
> > > + break;
> > > +
> > > + case PM_POST_RESTORE:
> > > + case PM_POST_HIBERNATION:
> > > + case PM_POST_SUSPEND:
> > > + ret = smp_call_function_single(0, hfi_do_enable, hfi, true);
> > > + break;
> >
> > Because this handles the boot CPU only, one has to wonder if it should
> > be a syscore op rather than a PM notifier.
> >
> > It does not sleep AFAICS, so it can run in that context, and it is
> > guaranteed to run on the boot CPU then, so it is not necessary to
> > force that. Moreover, syscore ops are guaranteed to be
> > non-concurrent, so locking is not needed.
> >
> > In addition, disabling HFI from a PM notifier is generally observable
> > by user space, because PM notifiers run before user space is frozen,
> > but doing it from a syscore op wouldn't be.

Yes, we only have to handle the boot CPU. The rest are handled via CPU
offline. Then syscore ops look like a good fit for me.

>
> One more thing: PM notifiers run on all variants of system suspend and
> resume, including suspend-to-idle in which case HFI need not be
> disabled/enabled IIUC and syscore ops only run in hibernation and
> "deep" suspend cycles, so they cover the cases in which the special
> handling is really needed and don't add useless overhead otherwise.

I verified that the HFI configuration survives suspend-to-idle. No extra
handling is needed.