Re: [PATCH v3] thermal: intel: hfi: Add syscore callbacks

From: Rafael J. Wysocki
Date: Wed Jan 10 2024 - 08:47:17 EST


On Wed, Jan 10, 2024 at 4:05 AM Ricardo Neri
<ricardo.neri-calderon@xxxxxxxxxxxxxxx> wrote:
>
> The kernel allocates a memory buffer and provides its location to the
> hardware, which uses it to update the HFI table. This allocation occurs
> during boot and remains constant throughout runtime.
>
> When resuming from hibernation, the restore kernel allocates a second
> memory buffer and reprograms the HFI hardware with the new location as part
> of a normal boot. The location of the second memory buffer may differ from
> the one allocated by the image kernel. When the restore kernel transfers
> control to the image kernel, the second buffer becomes invalid, potentially
> leading to memory corruption if the hardware writes to it (the hardware
> continues using the buffer from the restore kernel).
>
> It is also possible that the hardware forgets the address of the memory
> buffer when resuming from "deep" suspend. Memory corruption may also occur
> in such a scenario.
>
> To prevent the described memory corruption, disable HFI when preparing to
> suspend or hibernate. Enable it when resuming.
>
> Add syscore callbacks to handle the package of the boot CPU (packages of
> non-boot CPUs are handled via CPU offline). Syscore ops always run on the
> boot CPU. Additionally, HFI only needs to be disabled during "deep" suspend
> and hibernation. Syscore ops only run in these cases.
>
> Cc: Chen Yu <yu.c.chen@xxxxxxxxx>
> Cc: Len Brown <len.brown@xxxxxxxxx>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@xxxxxxxxxxxxxxx>
> Cc: Stanislaw Gruszka <stanislaw.gruszka@xxxxxxxxxxxxxxx>
> Cc: Zhang Rui <rui.zhang@xxxxxxxxx>
> Cc: Zhao Liu <zhao1.liu@xxxxxxxxxxxxxxx>
> Cc: linux-pm@xxxxxxxxxxxxxxx
> Cc: stable@xxxxxxxxxxxxxxx # 6.1
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@xxxxxxxxxxxxxxx>
> ---
> This is v3 of the patch [4/4] of [1]. Rafael has already accepted the
> first three patches. Thanks!
>
> I tested this patch on the testing branch of Rafael on Jan 9th. 2024. I
> tested it on a Meteor Lake system. It completed 1000 hibernation-resume
> cycles.
>
> [1]. https://lore.kernel.org/all/20240103041459.11113-1-ricardo.neri-calderon@xxxxxxxxxxxxxxx/
>
> ---
> Changes since v2:
> * Switched to syscore ops instead of a suspend notifier as it is a better
> fit: we only need to handle the boot CPU, disabling HFI is only needed
> for "deep" suspend and hibernation. (Rafael)
> * Removed incorrect statement about the suspend flow in the commit
> message. (Rafael)
>
> Changes since v1:
> * Moved registration of the suspend notifier towards the end of
> intel_hfi_init(). (Stan)
> * Renamed hfi_do_pm_[enable|disable]() to hfi_do_[enable|disable](). Stan
> will use these functions outside the suspend notifier. (Stan)
> * Added locking to calls to hfi_[enable|disable]() from the suspend
> notifier. (Rafael)
> ---
> drivers/thermal/intel/intel_hfi.c | 28 ++++++++++++++++++++++++++++
> 1 file changed, 28 insertions(+)
>
> diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
> index 22445403b520..768cd5f9eb32 100644
> --- a/drivers/thermal/intel/intel_hfi.c
> +++ b/drivers/thermal/intel/intel_hfi.c
> @@ -35,7 +35,9 @@
> #include <linux/processor.h>
> #include <linux/slab.h>
> #include <linux/spinlock.h>
> +#include <linux/suspend.h>
> #include <linux/string.h>
> +#include <linux/syscore_ops.h>
> #include <linux/topology.h>
> #include <linux/workqueue.h>
>
> @@ -571,6 +573,30 @@ static __init int hfi_parse_features(void)
> return 0;
> }
>
> +static void hfi_do_enable(void)
> +{
> + /* If we are here, we are on the boot CPU */
> + struct hfi_cpu_info *info = &per_cpu(hfi_cpu_info, 0);
> + struct hfi_instance *hfi_instance = info->hfi_instance;
> +
> + /* No locking needed. There is no concurrency with CPU online. */
> + hfi_set_hw_table(hfi_instance);
> + hfi_enable();
> +}
> +
> +static int hfi_do_disable(void)
> +{
> + /* No locking needed. There is no concurrency with CPU offline. */
> + hfi_disable();
> +
> + return 0;
> +}
> +
> +static struct syscore_ops hfi_pm_ops = {
> + .resume = hfi_do_enable,
> + .suspend = hfi_do_disable,
> +};
> +
> void __init intel_hfi_init(void)
> {
> struct hfi_instance *hfi_instance;
> @@ -602,6 +628,8 @@ void __init intel_hfi_init(void)
> if (!hfi_updates_wq)
> goto err_nomem;
>
> + register_syscore_ops(&hfi_pm_ops);
> +
> return;
>
> err_nomem:
> --

Applied as 6.8-rc1 material, with edited subject and changelog, and
I've reworded a comment in hfi_do_enable() to speak about the code.

Thanks!