Re: [PATCH] Allocate DMAR fault interrupts locally

From: Jacob Pan
Date: Mon Mar 25 2024 - 14:53:39 EST


Hi Thomas,

On Sun, 24 Mar 2024 22:05:35 +0100, Thomas Gleixner <tglx@xxxxxxxxxxxxx>
wrote:

> Jacob!
>
> On Thu, Mar 21 2024 at 15:13, Jacob Pan wrote:
> > On Mon, 11 Mar 2024 15:38:59 -0500, Dimitri Sivanich <sivanich@xxxxxxx>
> > wrote:
> >> So the code seems to have been designed based on the assumption that it
> >> will be run on an already active (though not necessarily fully
> >> onlined?) cpu. To make this work, any code based on that assumption
> >> would need to be fixed. Otherwise, a different approach is needed.
> >
> > This may not be pretty but since DMAR fault is for unrecoverable faults,
> > they are rare and infrequent, should never happen on a healthy system.
> > Can we share one BSP vector for all DMARs? i.e. let dmar_fault()
> > handler search for the offending DMAR for fault reasons.
>
> It might not be pretty on the first thought, but it has a charm.
>
> You are right. If DMAR faults happen then there is an issue which is
> going to affect the machine badly anyway, so the extra search through
> the iommu list is not necessarily horrible and it does not matter at all
> whether the access is local or not.
>
> But there are two interrupts involved. The DMAR one and the SVM one.
Actually, there are three. perfmon for the newer SoCs with enhanced command
interface.

> And all of that DMAR/SVM code is built around the assumption that
> everything can be set up at boot time on the BSP.
>
> The DMAR case definitely is solvable by sharing the interrupt, see the
> uncompiled below. I still need to wrap my brain around the SVM part, but
> feel free to beat me to it or preferrably explain to me that it's not
> necessary at all :)
>
I don't think it is necessary for SVM (page request interrupts) nor PERMON
IRQs. Their affinity already set to their local node and spread out.
Unlike DMAR fault IRQ, they are performance critical. SVM PRQ needs to have
access to local queues (see potential bug below).

So on my dual socket system. IRQ 40-47 are DMAR-prq (SVM). I see:

root@984fee003c4f:~/jacob/irqstat# for i in {40..47}; do cat /sys/kernel/debug/irq/irqs/$i | grep -e eff;done
effectiv: 2
effectiv: 3
effectiv: 4
effectiv: 5
effectiv: 49
effectiv: 50
effectiv: 51
effectiv: 52
(reverse-i-search)`nom': sudo apt-get install g^Cme-settings-daemon
root@984fee003c4f:~/jacob/irqstat# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 1
node 0 size: 64198 MB
node 0 free: 60458 MB
node 1 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 8
7 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175
90 191
node 1 size: 64437 MB
node 1 free: 62793 MB
node distances:
node 0 1
0: 10 21
1: 21 10

I will give the code below a try, perhaps use a built-in device to
trigger DMAR faults.

On a separate note, just found out that page request queues are not
allocated locally, probably should be:

--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -67,7 +67,8 @@ int intel_svm_enable_prq(struct intel_iommu *iommu)
struct page *pages;
int irq, ret;

- pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, PRQ_ORDER);
+ pages = alloc_pages_node(iommu->node, GFP_KERNEL | __GFP_ZERO, PRQ_ORDER);
+
> ---
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -1182,7 +1182,6 @@ static void free_iommu(struct intel_iomm
> iommu->pr_irq = 0;
> }
> free_irq(iommu->irq, iommu);
> - dmar_free_hwirq(iommu->irq);
> iommu->irq = 0;
> }
>
> @@ -1956,17 +1955,21 @@ void dmar_msi_mask(struct irq_data *data
> raw_spin_unlock_irqrestore(&iommu->register_lock, flag);
> }
>
> -void dmar_msi_write(int irq, struct msi_msg *msg)
> +static void dmar_iommu_msi_write(struct intel_iommu *iommu, struct
> msi_msg *msg) {
> - struct intel_iommu *iommu = irq_get_handler_data(irq);
> int reg = dmar_msi_reg(iommu, irq);
> - unsigned long flag;
> + unsigned long flags;
>
> - raw_spin_lock_irqsave(&iommu->register_lock, flag);
> + raw_spin_lock_irqsave(&iommu->register_lock, flags);
> writel(msg->data, iommu->reg + reg + 4);
> writel(msg->address_lo, iommu->reg + reg + 8);
> writel(msg->address_hi, iommu->reg + reg + 12);
> - raw_spin_unlock_irqrestore(&iommu->register_lock, flag);
> + raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
> +}
> +
> +void dmar_msi_write(int irq, struct msi_msg *msg)
> +{
> + dmar_iommu_msi_write(irq_get_handler_data(irq), msg);
> }
>
> void dmar_msi_read(int irq, struct msi_msg *msg)
> @@ -2100,23 +2103,37 @@ irqreturn_t dmar_fault(int irq, void *de
>
> int dmar_set_interrupt(struct intel_iommu *iommu)
> {
> - int irq, ret;
> + static int dmar_irq;
> + int ret;
>
> - /*
> - * Check if the fault interrupt is already initialized.
> - */
> + /* Don't initialize it twice for a given iommu */
> if (iommu->irq)
> return 0;
>
> - irq = dmar_alloc_hwirq(iommu->seq_id, iommu->node, iommu);
> - if (irq > 0) {
> - iommu->irq = irq;
> + /*
> + * There is one shared interrupt for all IOMMUs to prevent vector
> + * exhaustion.
> + */
> + if (!dmar_irq) {
> + int irq = dmar_alloc_hwirq(iommu->seq_id, iommu->node,
> iommu); +
> + if (irq <= 0) {
> + pr_err("No free IRQ vectors\n");
> + return -EINVAL;
> + }
> + dmar_irq = irq;
> } else {
> - pr_err("No free IRQ vectors\n");
> - return -EINVAL;
> + struct msi_msg msg;
> +
> + /*
> + * Get the MSI message from the shared interrupt and
> write
> + * it to the iommu MSI registers.
> + */
> + dmar_msi_read(dmar_irq, &msg);
> + dmar_iommu_msi_write(iommu, &msg);
> }
>
> - ret = request_irq(irq, dmar_fault, IRQF_NO_THREAD, iommu->name,
> iommu);
> + ret = request_irq(dmar_irq, dmar_fault, IRQF_NO_THREAD |
> IRQF_SHARED, iommu->name, iommu); if (ret)
> pr_err("Can't request irq\n");
> return ret;


Thanks,

Jacob