Re: [RFC PATCH 1/2] suspend: extend S2Idle ops by new notify handler

From: Rafael J. Wysocki
Date: Mon Feb 06 2023 - 13:58:50 EST


On Wed, Nov 30, 2022 at 1:25 PM Grzegorz Jaszczyk <jaz@xxxxxxxxxxxx> wrote:
>
> Hi Rafael,
>
> Kindly reminder about this topic.

Well, it's been a bit outdated since Mario's changes have been integrated.

> BTW I've noticed that in the meantime v. similar patch was merged but
> aimed for debugging purposes [1] (it uses s/notify/check and invokes
> the callback a bit earlier just before s2idle_entry).
> Perhaps combining Mario's [1] with aligned to it patch #2 of this
> series [2] could be used and accepted as s2idle notifications method?

Maybe it could.

Please send a new series based on top of the current mainline kernel
(6.2-rc7 as of yesterday) and we'll see.

> [1] https://patchwork.kernel.org/project/linux-pm/patch/20220829162953.5947-2-mario.limonciello@xxxxxxx
> [2] https://patchwork.kernel.org/project/linux-pm/patch/20220707125329.378277-3-jaz@xxxxxxxxxxxx/
>
> pon., 12 wrz 2022 o 16:44 Grzegorz Jaszczyk <jaz@xxxxxxxxxxxx> napisał(a):
> >
> > Hi Rafael,
> >
> > Gentle ping
> >
> > Best regards,
> > Grzegorz
> >
> > pon., 22 sie 2022 o 11:26 Grzegorz Jaszczyk <jaz@xxxxxxxxxxxx> napisał(a):
> > >
> > > Hi Rafael,
> > >
> > > Could you please kindly comment on the above?
> > >
> > > Thank you in advance,
> > > Grzegorz
> > >
> > > śr., 20 lip 2022 o 15:15 Grzegorz Jaszczyk <jaz@xxxxxxxxxxxx> napisał(a):
> > > >
> > > > wt., 19 lip 2022 o 20:09 Rafael J. Wysocki <rafael@xxxxxxxxxx> napisał(a):
> > > > >
> > > > > On Thu, Jul 7, 2022 at 2:56 PM Grzegorz Jaszczyk <jaz@xxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > Currently the LPS0 prepare_late callback is aimed to run as the very
> > > > > > last thing before entering the S2Idle state from LPS0 perspective,
> > > > > > nevertheless between this call and the system actually entering the
> > > > > > S2Idle state there are several places where the suspension process could
> > > > > > be canceled.
> > > > >
> > > > > And why is this a problem?
> > > > >
> > > > > The cancellation will occur only if there is a wakeup signal that
> > > > > would otherwise cause one of the CPUs to exit the idle state. Such a
> > > > > wakeup signal can appear after calling the new notifier as well, so
> > > > > why does it make a difference?
> > > >
> > > > It could also occur due to suspend_test. Additionally with new
> > > > notifier we could get notification when the system wakes up from
> > > > s2idle_loop and immediately goes to sleep again (due to e.g.
> > > > acpi_s2idle_wake condition not being met) - in this case relying on
> > > > prepare_late callback is not possible since it is not called in this
> > > > path.
> > > >
> > > > >
> > > > > > In order to notify VMM about guest entering suspend, extend the S2Idle
> > > > > > ops by new notify callback, which will be really invoked as a very last
> > > > > > thing before guest actually enters S2Idle state.
> > > > >
> > > > > It is not guaranteed that "suspend" (defined as all CPUs entering idle
> > > > > states) will be actually entered even after this "last step".
> > > >
> > > > Since this whole patchset is aimed at notifying the host about a guest
> > > > entering s2idle state, reaching this step can be considered as a
> > > > suspend "entry point" for VM IMO. It is because we are talking about
> > > > the vCPU not the real CPU. Therefore it seems to me, that even if some
> > > > other vCPUs could still get some wakeup signal they will not be able
> > > > to kick (through s2idle_wake->swake_up_one(&s2idle_wait_head);) the
> > > > original vCPU which entered s2idle_loop, triggered the new notifier
> > > > and is halted due to handling vCPU exit (and was about to trigger
> > > > swait_event_exclusive). So it will prevent the VM's resume process
> > > > from being started.
> > > >
> > > > >
> > > > > > Additionally extend the acpi_s2idle_dev_ops by notify() callback so
> > > > > > any driver can hook into it and allow to implement its own notification.
> > > > > >
> > > > > > Taking advantage of e.g. existing acpi_s2idle_dev_ops's prepare/restore
> > > > > > hooks is not an option since it will not allow to prevent race
> > > > > > conditions:
> > > > > > - VM0 enters s2idle
> > > > > > - host notes about VM0 is in s2idle
> > > > > > - host continues with system suspension but in the meantime VM0 exits
> > > > > > s2idle and sends notification but it is already too late (VM could not
> > > > > > even send notification on time).
> > > > >
> > > > > Too late for what?
> > > >
> > > > Too late to cancel the host suspend process, which thinks that the VM
> > > > is in s2idle state while it isn't.
> > > >
> > > > >
> > > > > > Introducing notify() as a very last step before the system enters S2Idle
> > > > > > together with an assumption that the VMM has control over guest
> > > > > > resumption allows preventing mentioned races.
> > > > >
> > > > > How does it do that?
> > > >
> > > > At the moment when VM triggers this new notifier we trap on MMIO
> > > > access and the VMM handles vCPU exit (so the vCPU is "halted").
> > > > Therefore the VMM could control when it finishes such handling and
> > > > releases the vCPU again.
> > > >
> > > > Maybe adding some more context will be helpful. This patchset was
> > > > aimed for two different scenarios actually:
> > > > 1) Host is about to enter the suspend state and needs first to suspend
> > > > VM with all pass-through devices. In this case the host waits for
> > > > s2idle notification from the guest and when it receives it, it
> > > > continues with its own suspend process.
> > > > 2) Guest could be a "privileged" one (in terms of VMM) and when the
> > > > guest enters s2idle state it notifies the host, which in turn triggers
> > > > the suspend process of the host.
> > > >
> > > > >
> > > > > It looks like you want suspend-to-idle to behave like S3 and it won't.
> > > >
> > > > In a way, yes, we compensate for the lack of something like PM1_CNT to
> > > > trap on for detecting that the guest is suspending.
> > > > We could instead force the guest to use S3 but IMO it is undesirable,
> > > > since it generally does make a difference which suspend mode is used
> > > > in the guest, s2idle or S3, e.g some drivers check which suspend type
> > > > is used and based on that behaves differently during suspend. One of
> > > > the example is:
> > > > https://elixir.bootlin.com/linux/v5.18.12/source/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c#L2323
> > > > https://elixir.bootlin.com/linux/v5.18.12/source/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c#L1069
> > > > https://elixir.bootlin.com/linux/v5.18.12/source/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c#L583
> > > >
> > > > Thank you,
> > > > Grzegorz