Re: [PATCH v4 0/2] Detect stalls on guest vCPUS

From: Rob Herring
Date: Fri Apr 29 2022 - 16:25:54 EST


On Fri, Apr 29, 2022 at 08:30:29AM +0000, Sebastian Ene wrote:
> This adds a mechanism to detect stalls on the guest vCPUS by creating a
> per CPU hrtimer which periodically 'pets' the host backend driver.
> On a conventional watchdog-core driver, the userspace is responsible for
> delivering the 'pet' events by writing to the particular /dev/watchdogN node.
> In this case we require a strong thread affinity to be able to
> account for lost time on a per vCPU basis.
>
> This device driver acts as a soft lockup detector by relying on the host
> backend driver to measure the elapesed time between subsequent 'pet' events.
> If the elapsed time doesn't match an expected value, the backend driver
> decides that the guest vCPU is locked and resets the guest. The host
> backend driver takes into account the time that the guest is not
> running. The communication with the backend driver is done through MMIO
> and the register layout of the virtual watchdog is described as part of
> the backend driver changes.
>
> The host backend driver is implemented as part of:
> https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
>
> Changelog v4:
> - rename the source from vm-wdt.c -> vm-watchdog.c
> - convert all the error logging calls from pr_* to dev_* calls
> - rename the DTS node "clock" to "clock-frequency"

Why do I have a v4 now when the discussion on v3 is not concluded. Give
folks some time to respond. We're busy drinking from the firehose.

Rob