Re: Temporary KVM guest hangs connected to KSM and NUMA balancer

From: Friedrich Weber
Date: Fri Jan 12 2024 - 11:08:40 EST


On 11/01/2024 17:00, Sean Christopherson wrote:
> This is a known issue. It's mostly a KVM bug[1][2] (fix posted[3]), but I suspect
> that a bug in the dynamic preemption model logic[4] is also contributing to the
> behavior by causing KVM to yield on preempt models where it really shouldn't.

Thanks a lot for the pointers and the proposed fixes!

I still see the same temporary hangs with [3] applied on top of 6.7
(0dd3ee31). However, with [4] applied in addition, I have not seen any
temporary hangs yet.

As the v1 of [3] was reported to fix the reported bug [2] and looks very
similar to the v2 I tried, I wonder whether I might be seeing a slightly
different kind of hangs than the one reported in [2] -- also because the
reproducer relies heavily on KSM and AFAICT, KSM was entirely disabled
in [2]. I'll try to run a few more tests next week.

FWIW, the kernel config relevant to preemption:

CONFIG_PREEMPT_BUILD=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y
CONFIG_PREEMPT_DYNAMIC=y
CONFIG_PREEMPT_RCU=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_HAVE_PREEMPT_DYNAMIC_CALL=y
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_DRM_I915_PREEMPT_TIMEOUT=640
CONFIG_DRM_I915_PREEMPT_TIMEOUT_COMPUTE=7500
# CONFIG_DEBUG_PREEMPT is not set
# CONFIG_PREEMPT_TRACER is not set
# CONFIG_PREEMPTIRQ_DELAY_TEST is not set

Thanks again!

Friedrich

> [1] https://lore.kernel.org/all/ZNnPF4W26ZbAyGto@xxxxxxxxxxxxxxxxxxxxxxxxx
> [2] https://lore.kernel.org/all/bug-218259-28872@xxxxxxxxxxxxxxxxxxxxxxxxx%2F
> [3] https://lore.kernel.org/all/20240110012045.505046-1-seanjc@xxxxxxxxxx
> [4] https://lore.kernel.org/all/20240110214723.695930-1-seanjc@xxxxxxxxxx