Re: [PATCH] Allocate DMAR fault interrupts locally

From: Dimitri Sivanich
Date: Mon Mar 11 2024 - 16:40:07 EST


Thomas!

On Thu, Feb 29, 2024 at 11:18:37PM +0100, Thomas Gleixner wrote:
> Dimitri!
>
> On Thu, Feb 29 2024 at 14:07, Dimitri Sivanich wrote:
>
> > +}
> > +
> > +static int __init assign_dmar_vectors(void)
> > +{
> > + struct work_struct irq_remap_work;
> > + int nid;
> > +
> > + INIT_WORK(&irq_remap_work, irq_remap_enable_fault_handling_thr);
> > + cpus_read_lock();
> > + for_each_online_node(nid) {
> > + /* Boot cpu dmar vectors are assigned before the rest */
> > + if (nid == cpu_to_node(get_boot_cpu_id()))
> > + continue;
> > + schedule_work_on(cpumask_first(cpumask_of_node(nid)),
> > + &irq_remap_work);
> > + flush_work(&irq_remap_work);
> > + }
> > + cpus_read_unlock();
> > + return 0;
> > +}
> > +
> > +arch_initcall(assign_dmar_vectors);
>
> Stray newline before arch_initcall(), but that's not the problem.
>
> The real problems are:
>
> 1) This approach only works when _ALL_ APs have been brought up during
> boot. With 'maxcpus=N' on the command line this will fail to enable
> fault handling when the APs which have not been brought up initially
> are onlined later on.
>
> This might be working in practice because intel_iommu_init() will
> enable the interrupts later on via init_dmars() unconditionally, but
> that's far from correct because IRQ_REMAP does not depend on
> INTEL_IOMMU.
>
> 2) It leaves a gap where the reporting is not working between bringing
> up the APs during boot and this initcall. Mostly theoretical, but
> that does not make it more correct either.
>
> What you really want is a cpu hotplug state in the CPUHP_BP_PREPARE_DYN
> space which enables the interrupt for the node _before_ the first AP of
> the node is brought up. That will solve the problem nicely w/o any of
> the above issues.
>

Initially this sounds like a good approach. As things currently stand, however,
there are (at least) several problems with attempting to allocate interrupts on
cpus that are not running yet via the existing dmar_set_interrupt path.

- The code relies on node_to_cpumask_map (cpumask_of_node()), which has been
allocated, but not populated at the CPUHP_BP_PREPARE_DYN stage.

- The irq_matrix cpumaps do not indicate being online or initialized yet, except
for the boot cpu instance, of course.

So things still revert to boot cpu allocation, until we exhaust the vectors.

Of course, running the dmar_set_interrupt code from a CPUHP_AP_ONLINE_DYN state
does work (although I believe there is a concurrency issue that could show up
with the current dmar_set_interrupt path).

So the code seems to have been designed based on the assumption that it will be
run on an already active (though not necessarily fully onlined?) cpu. To make
this work, any code based on that assumption would need to be fixed. Otherwise,
a different approach is needed.