Re: [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs

From: Guenter Roeck
Date: Fri Apr 29 2022 - 13:03:09 EST


On 4/29/22 02:26, Sebastian Ene wrote:
On Fri, Apr 29, 2022 at 10:51:14AM +0200, Greg Kroah-Hartman wrote:
On Fri, Apr 29, 2022 at 08:30:33AM +0000, Sebastian Ene wrote:
This driver creates per-cpu hrtimers which are required to do the
periodic 'pet' operation. On a conventional watchdog-core driver, the
userspace is responsible for delivering the 'pet' events by writing to
the particular /dev/watchdogN node. In this case we require a strong
thread affinity to be able to account for lost time on a per vCPU.

This part of the driver is the 'frontend' which is reponsible for
delivering the periodic 'pet' events, configuring the virtual peripheral
and listening for cpu hotplug events. The other part of the driver
handles the peripheral emulation and this part accounts for lost time by
looking at the /proc/{}/task/{}/stat entries and is located here:
https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817

Signed-off-by: Sebastian Ene <sebastianene@xxxxxxxxxx>
---
drivers/misc/Kconfig | 12 +++
drivers/misc/Makefile | 1 +
drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++
3 files changed, 219 insertions(+)
create mode 100644 drivers/misc/vm-watchdog.c

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 2b9572a6d114..26c3a99e269c 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -493,6 +493,18 @@ config OPEN_DICE
If unsure, say N.
+config VM_WATCHDOG
+ tristate "Virtual Machine Watchdog"
+ select LOCKUP_DETECTOR
+ help
+ Detect CPU locks on the virtual machine. This driver relies on the
+ hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU
+ has to do a 'pet', it exits the guest through MMIO write and the
+ backend driver takes into account the lost ticks for this particular
+ CPU.

Hi,


There's nothing to keep this tied to a virtual machine at all, right?
You are just relying on some iomem address to be updated, so it should
be a "generic_iomem_watchdog" driver as there's nothing specific to vms
at all from what I can tell.

thanks,

greg k-h

That's right although I might think of using the term "generic lockup detector"

Agreed, that would be a much better name.

Guenter


instead of watchdog. The only reason why I would keep "virtual machine"
word in, is that there is no actual hardware for this.

Thanks,
Seb