Re: [PATCH v8 01/33] x86/traps: let common_interrupt() handle IRQ_MOVE_CLEANUP_VECTOR

From: Thomas Gleixner
Date: Sat Jun 03 2023 - 16:52:45 EST


On Mon, Apr 10 2023 at 01:14, Xin Li wrote:
> IRQ_MOVE_CLEANUP_VECTOR is the only one of the system IRQ vectors that
> is *below* FIRST_SYSTEM_VECTOR. It is a slow path, so just push it
> into common_interrupt() just before the spurious interrupt handling.

This is a complete NOOP on not FRED enabled systems as the IDT entry is
still separate. So this change makes no sense outside of the FRED
universe. Can we pretty please make this consistent?

Aside of that the change comes with zero justification. I can see why
this is done, i.e. to spare range checking in the FRED exception entry
code, but that brings up an interesting question:

IRQ_MOVE_CLEANUP_VECTOR is at vector 0x20 on purpose. 0x20 is the lowest
priority vector so that the following (mostly theoretical) situation
gets resolved:

sysvec_irq_move_cleanup()
if (is_pending_in_apic_IRR(vector_to_clean_up))
apic->send_IPI_self(IRQ_MOVE_CLEANUP_VECTOR);

I.e. when for whatever reasons the to be cleaned up vector is still
pending in the local APIC IRR the function retriggers
IRQ_MOVE_CLEANUP_VECTOR and returns. As the pending to be cleaned up
vector has higher priority it gets handled _before_ the cleanup
vector. Otherwise this ends up in a live lock.

So the question is whether FRED is changing that priority scheme or not.

> @@ -248,6 +248,10 @@ DEFINE_IDTENTRY_IRQ(common_interrupt)
> desc = __this_cpu_read(vector_irq[vector]);
> if (likely(!IS_ERR_OR_NULL(desc))) {
> handle_irq(desc, regs);
> +#ifdef CONFIG_SMP
> + } else if (vector == IRQ_MOVE_CLEANUP_VECTOR) {
> + sysvec_irq_move_cleanup(regs);

This nests IDTENTRY:

common_interrupt()
irqentry_enter();
kvm_set_cpu_l1tf_flush_l1d();
run_irq_on_irqstack_cond(__common_interrupt, ....)
__common_interrupt()
sysvec_irq_move_cleanup()
irqentry_enter(); <- FAIL
kvm_set_cpu_l1tf_flush_l1d(); <- WHY again?
run_sysvec_on_irqstack_cond(__sysvec_irq_move_cleanup);
__sysvec_irq_move_cleanup();
irqentry_exit();

It does not matter whether the cleanup vector is a slow path or
not. Regular interrupts are not nesting, period. Exceptions nest, but
IRQ_MOVE_CLEANUP_VECTOR is not an exception and we don't make an
exception for it.

Stop this mindless hackery and clean it up so it is correct for all
variants.

Thanks,

tglx