Re: [PATCH] rcu/tree: consider time a VM was suspended

From: Paul E. McKenney
Date: Mon May 17 2021 - 12:23:57 EST


On Sun, May 16, 2021 at 07:27:16PM +0900, Sergey Senozhatsky wrote:
> Soft watchdog timer function checks if a virtual machine
> was suspended and hence what looks like a lockup in fact
> is a false positive.
>
> This is what kvm_check_and_clear_guest_paused() does: it
> tests guest PVCLOCK_GUEST_STOPPED (which is set by the host)
> and if it's set then we need to touch all watchdogs and bail
> out.
>
> Watchdog timer function runs from IRQ, so PVCLOCK_GUEST_STOPPED
> check works fine.
>
> There is, however, one more watchdog that runs from IRQ, so
> watchdog timer fn races with it, and that watchdog is not aware
> of PVCLOCK_GUEST_STOPPED - RCU stall detector.
>
> apic_timer_interrupt()
> smp_apic_timer_interrupt()
> hrtimer_interrupt()
> __hrtimer_run_queues()
> tick_sched_timer()
> tick_sched_handle()
> update_process_times()
> rcu_sched_clock_irq()
>
> This triggers RCU stalls on our devices during VM resume.
>
> If tick_sched_handle()->rcu_sched_clock_irq() runs on a VCPU
> before watchdog_timer_fn()->kvm_check_and_clear_guest_paused()
> then there is nothing on this VCPU that touches watchdogs and
> RCU reads stale gp stall timestamp and new jiffies value, which
> makes it think that RCU has stalled.
>
> Make RCU stall watchdog aware of PVCLOCK_GUEST_STOPPED and
> don't report RCU stalls when we resume the VM.

Good point!

But if I understand your patch correctly, if the virtual machine is
stopped at any point during a grace period, even if only for a short time,
stall warnings are suppressed for that grace period forever, courtesy of
the call to rcu_cpu_stall_reset(). So, if something really is stalling,
and the virtual machine just happens to stop for a few milliseconds, the
stall warning is completely suppressed. Which would make it difficult
to debug the underlying stall condition.

Is it possible to provide RCU with information on the duration of the
virtual-machine stoppage so that RCU could adjust the timing of the
stall warning? Maybe by having something like rcu_cpu_stall_reset()
that takes the duration of the stoppage in jiffies?

Please note that I am not completely opposed to your patch below, but
I figured that this was a good time to ask if we can do better.

Thanx, Paul

> Signed-off-by: Sergey Senozhatsky <senozhatsky@xxxxxxxxxxxx>
> ---
> kernel/rcu/tree_stall.h | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index 59b95cc5cbdf..cb233c79f0bc 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -7,6 +7,8 @@
> * Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxx>
> */
>
> +#include <asm/kvm_para.h>
> +
> //////////////////////////////////////////////////////////////////////////////
> //
> // Controlling CPU stall warnings, including delay calculation.
> @@ -695,6 +697,14 @@ static void check_cpu_stall(struct rcu_data *rdp)
> (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
> cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
>
> + /*
> + * If a virtual machine is stopped by the host it can look to
> + * the watchdog like an RCU stall. Check to see if the host
> + * stopped the vm.
> + */
> + if (kvm_check_and_clear_guest_paused())
> + return;
> +
> /* We haven't checked in, so go dump stack. */
> print_cpu_stall(gps);
> if (READ_ONCE(rcu_cpu_stall_ftrace_dump))
> @@ -704,6 +714,14 @@ static void check_cpu_stall(struct rcu_data *rdp)
> ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
> cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
>
> + /*
> + * If a virtual machine is stopped by the host it can look to
> + * the watchdog like an RCU stall. Check to see if the host
> + * stopped the vm.
> + */
> + if (kvm_check_and_clear_guest_paused())
> + return;
> +
> /* They had a few time units to dump stack, so complain. */
> print_other_cpu_stall(gs2, gps);
> if (READ_ONCE(rcu_cpu_stall_ftrace_dump))
> --
> 2.31.1.751.gd2f1c929bd-goog
>