Re: [RFC PATCH 1/5] entry: Pass pt_regs to irqentry_exit_cond_resched()

From: Thomas Gleixner
Date: Tue Aug 09 2022 - 19:18:40 EST


On Mon, Aug 08 2022 at 12:38, Borislav Petkov wrote:
> On Fri, Aug 05, 2022 at 10:30:05AM -0700, ira.weiny@xxxxxxxxx wrote:
>> ---
>> arch/arm64/include/asm/preempt.h | 2 +-
>> arch/arm64/kernel/entry-common.c | 4 ++--
>> arch/x86/entry/common.c | 2 +-
>> include/linux/entry-common.h | 17 ++++++++------
>> kernel/entry/common.c | 13 +++++++----
>> kernel/sched/core.c | 40 ++++++++++++++++----------------
>> 6 files changed, 43 insertions(+), 35 deletions(-)
>
> Why all this churn?
>
> Why can't you add a parameter to irqentry_exit():
>
> noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state, bool cond_resched);
>
> and then have all callers except xen_pv_evtchn_do_upcall() pass in false
> and this way have all exit paths end up in irqentry_exit()?
>
> And, ofc, move the true case which is the body of
> raw_irqentry_exit_cond_resched() to irqentry_exit() and then get rid of
> former.
>
> xen_pv_evtchn_do_upcall() will, ofc, do:
>
> if (inhcall && !WARN_ON_ONCE(state.exit_rcu)) {
> irqentry_exit(regs, state, true);
> instrumentation_end();
> restore_inhcall(inhcall);
> } else {
> instrumentation_end();
> irqentry_exit(regs, state, false);
>

How is that less churn? Care to do:

git grep 'irqentry_exit(' arch/

The real question is:

Why is it required that irqentry_exit_cond_resched() handles any of
the auxiliary pt regs space?

That's completely unanswered by the changelog and absolutely irrelevant
for the problem at hand, i.e. storing the CPU number on irq/exception
entry.

So why is this patch in this series at all?

But even for future purposes it is more than questionable. Why?

Contrary to the claim of the changelog xen_pv_evtchn_do_upcall() is not
really a bypass of irqentry_exit(). The point is:

The hypercall is issued by the kernel from privcmd_ioctl_hypercall()
which does:

xen_preemptible_hcall_begin();
hypercall();
xen_preemptible_hcall_end();

So any upcall from the hypervisor to the guest will semantically hit
regular interrupt enabled guest kernel space which means that if that
code would call irqentry_exit() then it would run through the kernel
exit code path:

} else if (!regs_irqs_disabled(regs)) {

instrumentation_begin();
if (IS_ENABLED(CONFIG_PREEMPTION))
irqentry_exit_cond_resched();

/* Covers both tracing and lockdep */
trace_hardirqs_on();
instrumentation_end();
} ....

Which would fail to invoke irqentry_exit_cond_resched() if
CONFIG_PREEMPTION=n. But the whole point of this exercise is to allow
preemption from the upcall even when the kernel has CONFIG_PREEMPTION=n.

Staring at this XENPV code there are two issues:

1) That code lacks a trace_hardirqs_on() after the call to
irqentry_exit_cond_resched(). My bad.

2) CONFIG_PREEMPT_DYNAMIC broke this mechanism.

If the static call/key is disabled then the call becomes a NOP.

Frederic?

Both clearly demonstrate how well tested this XEN_PV muck is which means
we should just delete it.

Anyway. This wants the fix below.

If there is still a need to do anything about this XEN cond_resched()
muck for the PREEMPTION=n case for future auxregs usage then this can be
simply hidden in this new XEN helper and does not need any change to the
rest of the code.

I doubt that this is required, but if so then there needs to be a very
concise explanation and not just uncomprehensible hand waving word
salad.

Thanks,

tglx
---
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -283,9 +283,18 @@ static __always_inline void restore_inhc
{
__this_cpu_write(xen_in_preemptible_hcall, inhcall);
}
+
+static __always_inline void xenpv_irqentry_exit_cond_resched(void)
+{
+ instrumentation_begin();
+ raw_irqentry_exit_cond_resched();
+ trace_hardirqs_on();
+ instrumentation_end();
+}
#else
static __always_inline bool get_and_clear_inhcall(void) { return false; }
static __always_inline void restore_inhcall(bool inhcall) { }
+static __always_inline void xenpv_irqentry_exit_cond_resched(void) { }
#endif

static void __xen_pv_evtchn_do_upcall(struct pt_regs *regs)
@@ -306,11 +315,11 @@ static void __xen_pv_evtchn_do_upcall(st

instrumentation_begin();
run_sysvec_on_irqstack_cond(__xen_pv_evtchn_do_upcall, regs);
+ instrumentation_end();

inhcall = get_and_clear_inhcall();
if (inhcall && !WARN_ON_ONCE(state.exit_rcu)) {
- irqentry_exit_cond_resched();
- instrumentation_end();
+ xenpv_irqentry_exit_cond_resched();
restore_inhcall(inhcall);
} else {
instrumentation_end();