Re: [Intel-gfx] dmar messages caused by graphics.

From: Daniel Vetter
Date: Tue Oct 21 2014 - 11:36:47 EST


On Fri, Oct 17, 2014 at 05:17:16PM -0400, Dave Jones wrote:
> Just hit this while fuzz-testing, (curiously, no graphics
> related stuff was happening, X isn't even loaded on that box).
>
> dmar: DRHD: handling fault status reg 2
> dmar: DMAR:[DMA Write] Request device [00:02.0] fault addr 7ffffff000
> DMAR:[fault reason 05] PTE Write access is not set
>
>
> 00:02:0 is..
>
> 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th
> Gen Core Processor Integrated Graphics Controller (rev 06) (prog-if 00
> [VGA controller])
>
> 00: 86 80 12 04 07 04 90 00 06 00 00 03 00 00 00 00
> 10: 04 00 00 c0 00 00 00 00 0c 00 00 b0 00 00 00 00
> 20: 01 30 00 00 00 00 00 00 00 00 00 00 86 80 12 22
> 30: 00 00 00 00 90 00 00 00 00 00 00 00 0b 01 00 00
>
>
> So then I rebooted, and noticed it spewed the exact same message on boot up too.
>
> I power cycled, and this time got
>
> [ 0.576231] dmar: Host address width 39
> [ 0.576336] dmar: DRHD base: 0x000000fed90000 flags: 0x0
> [ 0.576491] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c0000020660462 ecap f0101a
> [ 0.576659] dmar: DRHD base: 0x000000fed91000 flags: 0x1
> [ 0.576793] dmar: IOMMU 1: reg_base_addr fed91000 ver 1:0 cap d2008020660462 ecap f010da
> [ 0.576961] dmar: RMRR base: 0x000000a2a1f000 end: 0x000000a2a32fff
> [ 0.577075] dmar: RMRR base: 0x000000ad800000 end: 0x000000af9fffff
> [ 6.715745] DMAR: No ATSR found
> [ 8.081845] [drm] DMAR active, disabling use of stolen memory
> [ 9.927343] dmar: DRHD: handling fault status reg 2
> [ 9.928335] dmar: DMAR:[DMA Write] Request device [00:02.0] fault addr 3c11284000
> DMAR:[fault reason 05] PTE Write access is not set
> [ 11.916211] dmar: DRHD: handling fault status reg 2
> [ 11.917105] dmar: DMAR:[DMA Write] Request device [00:02.0] fault addr 3c11284000
> DMAR:[fault reason 05] PTE Write access is not set
>
>
> Same thing, different fault address. It seems to change every time I boot.
>
>
> Looking in the logs, this started happening on the 15th. The first instance
> was this during boot..
>
> [ 9.917240] dmar: DRHD: handling fault status reg 2
> [ 9.918150] dmar: DMAR:[DMA Write] Request device [00:02.0] fault addr 7300000000
> [ 9.918150] DMAR:[fault reason 05] PTE Write access is not set
> [ 9.919582] dmar: DMAR:[DMA Write] Request device [00:02.0] fault addr 7ffffff000
> [ 9.919582] DMAR:[fault reason 05] PTE Write access is not set
> [ 10.157240] dmar: DRHD: handling fault status reg 3
> [ 10.158017] dmar: DMAR:[DMA Write] Request device [00:02.0] fault addr 3579736000
> [ 10.158017] DMAR:[fault reason 05] PTE Write access is not set
> [ 11.926114] dmar: DRHD: handling fault status reg 3
> [ 11.927117] dmar: DMAR:[DMA Write] Request device [00:02.0] fault addr 7300000000
> [ 11.927117] DMAR:[fault reason 05] PTE Write access is not set
>
> That time, the 'reg 3' showed up.
>
> Dying hardware ? Or bug ?

We see these occasionally after the gpu has gone bananas, and iirc also
sometimes after module reload (we probably botch the reinit stuff a bit).
That it happens without anything really going on from the gfx is slightly
more disturbing indeed. Any chance this could have been a kernel
regression?
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/