Re: [patch 1/4] vt-d: quirk for masking vtd spec errors to platformerror handling logic

From: Jesse Barnes
Date: Mon Dec 06 2010 - 12:34:38 EST


On Tue, 30 Nov 2010 22:22:26 -0800
Suresh Siddha <suresh.b.siddha@xxxxxxxxx> wrote:

> On platforms with Intel 7500 chipset, there were some reports of system
> hang/NMI's during kexec/kdump in the presence of interrupt-remapping enabled.
>
> During kdump, there is a window where the devices might be still using old
> kernel's interrupt information, while the kdump kernel is coming up. This can
> cause vt-d faults as the interrupt configuration from the old kernel map to
> null IRTE entries in the new kernel etc. (with out interrupt-remapping enabled,
> we still have the same issue but in this case we will see benign spurious
> interrupt hit the new kernel).
>
> Based on platform config settings, these platforms seem to generate NMI/SMI
> when a vt-d fault happens and there were reports that the resulting SMI causes
> the system to hang.
>
> Fix it by masking vt-d spec defined errors to platform error reporting logic.
> VT-d spec related errors are already handled by the VT-d OS code, so need to
> report the same erorr through other channels.
>
> Signed-off-by: Suresh Siddha <suresh.b.siddha@xxxxxxxxx>
> Cc: stable@xxxxxxxxxx [v2.6.32+]
> ---
> drivers/pci/quirks.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> Index: tip/drivers/pci/quirks.c
> ===================================================================
> --- tip.orig/drivers/pci/quirks.c
> +++ tip/drivers/pci/quirks.c
> @@ -2764,6 +2764,26 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_RI
> DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_RICOH, PCI_DEVICE_ID_RICOH_R5C832, ricoh_mmc_fixup_r5c832);
> #endif /*CONFIG_MMC_RICOH_MMC*/
>
> +#if defined(CONFIG_DMAR) || defined(CONFIG_INTR_REMAP)
> +/*
> + * This is a quirk for masking vt-d spec defined errors to platform error
> + * handling logic. With out this, platforms seem to generate NMI/SMI (based
> + * on the RAS config settings of the platform) when a vt-d fault happens and
> + * there were reports that the resulting SMI causes system to hang.
> + *
> + * VT-d spec related errors are already handled by the VT-d OS code, so no
> + * need to report the same erorr through other channels.
> + */
> +static void vtd_mask_spec_errors(struct pci_dev *dev)
> +{
> + u32 word;
> +
> + pci_read_config_dword(dev, 0x1AC, &word);
> + pci_write_config_dword(dev, 0x1AC, word | (1 << 31));
> +}
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x342e, vtd_mask_spec_errors);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x3c28, vtd_mask_spec_errors);
> +#endif
>
> static void pci_do_fixups(struct pci_dev *dev, struct pci_fixup *f,
> struct pci_fixup *end)

Can we make these registers and bits a bit more self-documenting (i.e.
#defines for both, maybe along with other useful bit definitions for
this reg)? Also, "error" is misspelled as "erorr" above. :)

--
Jesse Barnes, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/