Re: IOMMU+DMAR causing NMIs-s (was: 4.7-rc6: NMI in intel_idle on HP Proliant G6)

From: Meelis Roos
Date: Wed Jul 13 2016 - 05:15:03 EST


> > > > Bisecting kernel configs shows that it's DMAR+IOMMU. When it is
> > > > activated, there is high probability of NMI-s in random places.
> > >
> > > Hmm, strange. But nothing could really surprise when you have an HP
> > > BIOS.
> >
> > BIOS P64 01/22/2015. There seems to be a newer 2015.08.16 BIOS out but
> > the release notes only describe updated CPU microcode for security
> > reasons.
>
> It is probably something HP is selling as a "feature" and not a BIOS
> bug.

ROM setup settings that might be of interest:

Advanced memory protection: advanced ecc support
No-Execute memory protection: enabled
Intel virtualization technology: enabled
Intel hyperthreading options: enabled
Processor core disable: all cored enabled
Intel turbo boost technology: enabled
Intel VT-d: enabled
HP power profile: custom
HP power regulator: hp dynamic power savings mode (not OS control)
Intel qpi link power management: enabled
Minimum processor idle power core state: C6
Minimum processor idle power package state: C6
Dynamic power saving mode response: fast
Collaborative power control: enabled
MPS table: full table apic
NMI debug button: enabled
PCI bus padding options: enabled
HW prefetcher: enabled
Adjacent sector prefetch: enabled
Node interleaving: disabled


> > > Can you probably use the faulty config and bisect this down to a
> > > specific commit? In v4.7-rc1 some changes to the iova-allocation code
> > > got merged, but I have no idea how those could cause NMIs.
> >
> > Will try but I do not know a working base yet - this was broken in both
> > 4.6 and 4.7-rc.
>
> Oh, in that case it is not related to the recent iova changes. Does the
> box have any hardware error log which you can access and send to us
> (right after some NMIs happened)?

Nothing in ILO log or integrated management log (IML).

--
Meelis Roos (mroos@xxxxxxxx)