Re: [PATCH] Updates to Xen hypercall preemption

From: Peter Zijlstra
Date: Wed Jun 21 2023 - 12:41:43 EST


On Wed, Jun 21, 2023 at 03:14:42PM +0000, Per Bilse wrote:
> Some Xen hypercalls issued by dom0 guests may run for many 10s of
> seconds, potentially causing watchdog timeouts and other problems.
> It's rare for this to happen, but it does in extreme circumstances,
> for instance when shutting down VMs with very large memory allocations
> (> 0.5 - 1TB). These hypercalls are preemptible, but the fixes in the
> kernel to ensure preemption have fallen into a state of disrepair, and
> are currently ineffective. This patch brings things up to date by way of:

I don't understand it -- fundamentally, how can linux schedule when the
guest isn't even running? Hypercall transfers control to the
host/hypervisor and leaves the guest suspended.

> 1) Update general feature selection from XEN_PV to XEN_DOM0.
> The issue is unique to dom0 Xen guests, but isn't unique to PV dom0s,
> and will occur in future PVH dom0s. XEN_DOM0 depends on either PV or PVH,
> as well as the appropriate details for dom0.
>
> 2) Update specific feature selection from !PREEMPTION to !PREEMPT.
> The following table shows the relationship between different preemption
> features and their indicators/selectors (Y = "=Y", N = "is not set",
> . = absent):
>
> | np-s | np-d | vp-s | vp-d | fp-s | fp-d
> CONFIG_PREEMPT_DYNAMIC N Y N Y N Y
> CONFIG_PREEMPTION . Y . Y Y Y
> CONFIG_PREEMPT N N N N Y Y
> CONFIG_PREEMPT_VOLUNTARY N N Y Y N N
> CONFIG_PREEMPT_NONE Y Y N N N N
>
> Unless PREEMPT is set, we need to enable the fixes.
>
> 3) Update flag access from __this_cpu_XXX() to raw_cpu_XXX().
> The long-running hypercalls are flagged by way of a per-cpu variable
> which is set before and cleared after the relevant calls. This elicits
> a warning "BUG: using __this_cpu_write() in preemptible [00000000] code",
> but xen_pv_evtchn_do_upcall() deals specifically with this. For
> consistency, flag testing is also updated, and the code is simplified
> and tidied accordingly.

This makes no sense; the race that warning warns about is:

CPU0 CPU1
per-cpu write
<preempt-out>
<preempt-in>
do-hypercall

So you wrote the value on CPU0, got migrated to CPU1 because you had
preemptioned enabled, and then continue with the percpu value of CPU1
because that's where you're at now.

Simply making the warning go away doesn't help, CPU1 does hypercall
while store was on CPU0.

> 4) Update irqentry_exit_cond_resched() to raw_irqentry_exit_cond_resched().
> The code will call irqentry_exit_cond_resched() if the flag (as noted
> above) is set, but the dynamic preemption feature will livepatch that
> function to a no-op unless full preemption is selected. The code is
> therefore updated to call raw_irqentry_exit_cond_resched().

That, again meeds more explanation. Why do you want this if not
preemptible?

You're doing 4 things, that should be 4 patches. Also, please give more
clues for how this is supposed to work at all.