[BUG][REGRESSION] i915 gpu hangs under load

From: Martin Kepplinger
Date: Wed Mar 22 2017 - 04:39:32 EST


Hi

I know something similar is here: https://bugs.freedesktop.org/show_bug.cgi?id=100110 too.

But this is rc3 and my machine is totally *not usable*. Let me be annoying :) I hope I can help:

Since rc1 I get gpu hangs and resets under load: This is almost certainly a kernel issue. 4.10 is fine.
I keep a debian stable userspace. nouveau is running on this machine too.

Mar 22 09:17:01 martin-laptop kernel: [ 2409.538706] [drm] GPU HANG: ecode 7:0:0xf3cffffe, in gnome-shell [1869], reason: Hang on render ring, action: reset
Mar 22 09:17:01 martin-laptop kernel: [ 2409.538711] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Mar 22 09:17:01 martin-laptop kernel: [ 2409.538713] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Mar 22 09:17:01 martin-laptop kernel: [ 2409.538714] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Mar 22 09:17:01 martin-laptop kernel: [ 2409.538715] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Mar 22 09:17:01 martin-laptop kernel: [ 2409.538716] [drm] GPU crash dump saved to /sys/class/drm/card0/error
Mar 22 09:17:01 martin-laptop kernel: [ 2409.538768] drm/i915: Resetting chip after gpu hang
Mar 22 09:17:09 martin-laptop kernel: [ 2417.537886] drm/i915: Resetting chip after gpu hang
Mar 22 09:17:17 martin-laptop kernel: [ 2425.537152] drm/i915: Resetting chip after gpu hang
Mar 22 09:17:25 martin-laptop kernel: [ 2433.536407] drm/i915: Resetting chip after gpu hang
Mar 22 09:17:33 martin-laptop kernel: [ 2441.539674] drm/i915: Resetting chip after gpu hang


Furthermore, there are weird, small display distortions occuring. I don't get any log about them and
don't have a screenshot. Well. Nevermind. Please fix 4.11 and CC anyone I forgot.


thanks

martin
GPU HANG: ecode 7:0:0xf3cffffe, in gnome-shell [1869], reason: Hang on render ring, action: reset
Kernel: 4.11.0-rc3-00003-gbc61cd2
Time: 1490170621 s 524489 us
Boottime: 2409 s 756155 us
Uptime: 2395 s 323536 us
is_mobile: no
is_lp: no
is_alpha_support: no
has_64bit_reloc: no
has_aliasing_ppgtt: yes
has_csr: no
has_ddi: yes
has_decoupled_mmio: no
has_dp_mst: yes
has_fbc: yes
has_fpga_dbg: yes
has_full_ppgtt: yes
has_full_48bit_ppgtt: no
has_gmbus_irq: yes
has_gmch_display: no
has_guc: no
has_hotplug: yes
has_hw_contexts: yes
has_l3_dpf: yes
has_llc: yes
has_logical_ring_contexts: no
has_overlay: no
has_pipe_cxsr: no
has_pooled_eu: no
has_psr: yes
has_rc6: yes
has_rc6p: no
has_resource_streamer: yes
has_runtime_pm: yes
has_snoop: no
cursor_needs_physical: no
hws_needs_physical: no
overlay_needs_physical: no
supports_tv: no
Active process (on ring render): gnome-shell [1869], context bans 0
Reset count: 0
Suspend count: 0
Platform: HASWELL
PCI ID: 0x0416
PCI Revision: 0x06
PCI Subsystem: 10cf:17ac
IOMMU enabled?: 0
EIR: 0x00000000
IER: 0xfc002529
GTIER: 0x00401821
PGTBL_ER: 0x00000000
FORCEWAKE: 0x00000001
DERRMR: 0xffffffff
CCID: 0x00ef410d
Missed interrupts: 0x00000000
fence[0] = 00000000
fence[1] = 00000000
fence[2] = 00000000
fence[3] = 00000000
fence[4] = 00000000
fence[5] = 00000000
fence[6] = 00000000
fence[7] = 00000000
fence[8] = 00000000
fence[9] = 00000000
fence[10] = 00000000
fence[11] = 00000000
fence[12] = 00000000
fence[13] = 00000000
fence[14] = 00000000
fence[15] = 00000000
fence[16] = 00000000
fence[17] = 00000000
fence[18] = 4b530770374a001
fence[19] = 00000000
fence[20] = 00000000
fence[21] = 00000000
fence[22] = 00000000
fence[23] = 00000000
fence[24] = 00000000
fence[25] = 00000000
fence[26] = 00000000
fence[27] = 00000000
fence[28] = 00000000
fence[29] = 00000000
fence[30] = 00000000
fence[31] = 00000000
ERROR: 0x00000109
DONE_REG: 0xffffffff
ERR_INT: 0x00000000
render command stream:
START: 0x007ea000
HEAD: 0x07a1f6dc [0x0001f648]
TAIL: 0x0001f8f8 [0x0001f728, 0x0001f760]
CTL: 0x0001f001
MODE: 0x00004000
HWS: 0x7fff0000
ACTHD: 0x00000000 07a1f6dc
IPEIR: 0x00000000
IPEHR: 0x0c000000
INSTDONE: 0xffcffffe
SC_INSTDONE: 0xffffffff
SAMPLER_INSTDONE[0][0]: 0xffffffff
ROW_INSTDONE[0][0]: 0xffffffff
BBADDR: 0x00000000_7fa48330
BB_STATE: 0x00000000
INSTPS: 0x00000500
INSTPM: 0x00006080
FADDR: 0x00000000 008096d8
RC PSMI: 0x00000010
FAULT_REG: 0x000000c5
SYNC_0: 0x00000000
SYNC_1: 0x0001c2a1
SYNC_2: 0x00000000
GFX_MODE: 0x00002a00
PP_DIR_BASE: 0x7fdf0000
seqno: 0x0001c29a
last_seqno: 0x0001c2a2
waiting: yes
ring->head: 0x00016e60
ring->tail: 0x0001f8f8
hangcheck stall: yes
hangcheck action: dead
hangcheck action timestamp: 4295493232, 204600 ms ago
blt command stream:
START: 0x0080a000
HEAD: 0x07e0e8d0 [0x00000000]
TAIL: 0x0000e8d0 [0x00000000, 0x00000000]
CTL: 0x0001f001
MODE: 0x00000200
HWS: 0x7fff1000
ACTHD: 0x00000000 07e0e8d0
IPEIR: 0x00000000
IPEHR: 0x01000000
INSTDONE: 0xfffffffe
BBADDR: 0x00000000_7fff4028
BB_STATE: 0x00000000
INSTPS: 0x00000000
INSTPM: 0x00000000
FADDR: 0x00000000 008188d0
RC PSMI: 0x00000011
FAULT_REG: 0x00000000
SYNC_0: 0x0001c29a
SYNC_1: 0x00000000
SYNC_2: 0x00000000
GFX_MODE: 0x00000200
PP_DIR_BASE: 0x7fdf0000
seqno: 0x0001c2a1
last_seqno: 0x0001c2a1
waiting: no
ring->head: 0x00000000
ring->tail: 0x00000000
hangcheck stall: no
hangcheck action: idle
hangcheck action timestamp: 4295494736, 198584 ms ago
bsd command stream:
START: 0x0082a000
HEAD: 0x00000000 [0x00000000]
TAIL: 0x00000000 [0x00000000, 0x00000000]
CTL: 0x0001f001
MODE: 0x00000200
HWS: 0x7fff2000
ACTHD: 0x00000000 00000000
IPEIR: 0x00000000
IPEHR: 0x00000000
INSTDONE: 0xfffffffe
BBADDR: 0x00000000_00000000
BB_STATE: 0x00000000
INSTPS: 0x00000000
INSTPM: 0x00000000
FADDR: 0x00000000 0082a000
RC PSMI: 0x00000011
FAULT_REG: 0x00000000
SYNC_0: 0x0001c2a1
SYNC_1: 0x0001c29a
SYNC_2: 0x00000000
GFX_MODE: 0x00000200
PP_DIR_BASE: 0x00000000
seqno: 0x00000000
last_seqno: 0x00000000
waiting: no
ring->head: 0x00000000
ring->tail: 0x00000000
hangcheck stall: no
hangcheck action: idle
hangcheck action timestamp: 4295494736, 198584 ms ago
vebox command stream:
START: 0x0084a000
HEAD: 0x00000000 [0x00000000]
TAIL: 0x00000000 [0x00000000, 0x00000000]
CTL: 0x0001f001
MODE: 0x00000200
HWS: 0x7fff3000
ACTHD: 0x00000000 00000000
IPEIR: 0x00000000
IPEHR: 0x00000000
INSTDONE: 0xfffffffe
BBADDR: 0x00000000_00000000
BB_STATE: 0x00000000
INSTPS: 0x00000000
INSTPM: 0x00000000
FADDR: 0x00000000 0084a000
RC PSMI: 0x00000011
FAULT_REG: 0x00000000
SYNC_0: 0x0001c2a1
SYNC_1: 0x0001c29a
SYNC_2: 0x00000000
GFX_MODE: 0x00000200
PP_DIR_BASE: 0x00000000
seqno: 0x00000000
last_seqno: 0x00000000
waiting: no
ring->head: 0x00000000
ring->tail: 0x00000000
hangcheck stall: no
hangcheck action: idle
hangcheck action timestamp: 4295494736, 198584 ms ago
Active (render ring) [40]:
00000000_7fff8000 8192 37 00 [ 1c29e 00 00 00 00 ] 00 LLC
00000000_7e9df000 20971520 36 00 [ 1c29e 00 00 00 00 ] 00 X uncached (name: 2)
00000000_7fff7000 4096 36 00 [ 1c29e 00 00 00 00 ] 00 LLC
00000000_7d5df000 20971520 36 00 [ 1c29e 00 00 00 00 ] 00 Y LLC
00000000_7c1df000 20971520 36 00 [ 1c29e 00 00 00 00 ] 00 Y LLC
00000000_7bcdf000 5242880 36 00 [ 1c29e 00 00 00 00 ] 00 LLC
00000000_7fff6000 4096 37 00 [ 1c29e 00 00 00 00 ] 00 LLC
00000000_7b2df000 10485760 37 00 [ 1c29e 00 00 00 00 ] 00 X LLC (name: 10)
00000000_7b2d5000 40960 37 00 [ 1c29e 00 00 00 00 ] 00 LLC
00000000_7a8d5000 10485760 37 00 [ 1c29e 00 00 00 00 ] 00 X LLC (name: 8)
00000000_7a655000 2621440 37 00 [ 1c29e 00 00 00 00 ] 00 Y LLC
00000000_7fff5000 4096 37 00 [ 1c29e 00 00 00 00 ] 00 LLC
00000000_7a575000 917504 37 00 [ 1c29e 00 00 00 00 ] 00 X LLC (name: 11)
00000000_7a535000 262144 37 00 [ 1c29e 00 00 00 00 ] 00 Y LLC
00000000_79b35000 10485760 37 00 [ 1c29e 00 00 00 00 ] 00 X LLC (name: 5)
00000000_79535000 6291456 37 00 [ 1c29e 00 00 00 00 ] 00 X LLC (name: 12)
00000000_793b5000 1572864 37 00 [ 1c29e 00 00 00 00 ] 00 Y LLC
00000000_793ad000 32768 37 00 [ 1c29e 00 00 00 00 ] 00 dirty LLC
00000000_00ef3000 4096 09 00 [ 1c29e 00 00 00 00 ] 00 dirty purgeable LLC
00000000_77f9d000 4096 37 00 [ 1c2a0 00 00 00 00 ] 00 dirty LLC
00000000_77f9c000 4096 37 00 [ 1c2a0 00 00 00 00 ] 00 LLC
00000000_77f98000 16384 37 00 [ 1c2a0 00 00 00 00 ] 00 purgeable LLC
00000000_77f97000 4096 37 00 [ 1c2a0 00 00 00 00 ] 00 LLC
00000000_77e97000 1048576 37 00 [ 1c2a0 00 00 00 00 ] 00 X LLC
00000000_77e93000 16384 37 00 [ 1c2a0 00 00 00 00 ] 00 dirty purgeable LLC
00000000_7fffa000 16384 37 00 [ 1c2a0 00 00 00 00 ] 00 dirty LLC
00000000_00f06000 4096 09 00 [ 1c2a0 00 00 00 00 ] 00 dirty purgeable LLC
00000000_77fa8000 4096 37 00 [ 1c2a2 00 00 00 00 ] 00 LLC
00000000_77fa1000 28672 37 00 [ 1c2a2 00 00 00 00 ] 00 LLC
00000000_77fa0000 4096 37 00 [ 1c2a2 00 00 00 00 ] 00 LLC
00000000_77f9f000 4096 37 00 [ 1c2a2 00 00 00 00 ] 00 LLC
00000000_77f9e000 4096 37 00 [ 1c2a2 00 00 00 00 ] 00 LLC
00000000_77e56000 4096 37 00 [ 1c2a2 00 00 00 00 ] 00 dirty LLC
00000000_77e55000 4096 37 00 [ 1c2a2 00 00 00 00 ] 00 LLC
00000000_77e51000 16384 37 00 [ 1c2a2 00 00 00 00 ] 00 purgeable LLC
00000000_77e5b000 229376 36 00 [ 1c2a2 00 00 00 00 ] 00 X LLC
00000000_789ad000 10485760 36 00 [ 1c2a2 00 00 00 00 ] 00 X LLC
00000000_77e4d000 16384 37 00 [ 1c2a2 00 00 00 00 ] 00 purgeable LLC
00000000_77fa9000 16384 37 00 [ 1c2a2 00 00 00 00 ] 00 dirty LLC
00000000_00f07000 4096 09 00 [ 1c2a2 00 00 00 00 ] 00 dirty purgeable LLC
Pinned (global) [15]:
00000000_7fddf000 69632 41 00 [ 00 00 00 00 00 ] 00 LLC
00000000_7fff0000 4096 01 01 [ 00 00 00 00 00 ] 00 purgeable LLC
00000000_007ea000 131072 40 40 [ 00 00 00 00 00 ] 00 dirty LLC
00000000_7fffe000 4096 41 00 [ 00 00 00 00 00 ] 00 LLC
00000000_7fff1000 4096 01 01 [ 00 00 00 00 00 ] 00 purgeable LLC
00000000_0080a000 131072 40 40 [ 00 00 00 00 00 ] 00 dirty LLC
00000000_7fff2000 4096 01 01 [ 00 00 00 00 00 ] 00 purgeable LLC
00000000_0082a000 131072 40 40 [ 00 00 00 00 00 ] 00 dirty LLC
00000000_7fff3000 4096 01 01 [ 00 00 00 00 00 ] 00 purgeable LLC
00000000_0084a000 131072 40 40 [ 00 00 00 00 00 ] 00 dirty LLC
00000000_00000000 8294400 41 00 [ 00 00 00 00 00 ] 00 uncached
00000000_00f49000 16384 40 00 [ 00 00 00 00 00 ] 00 dirty uncached
00000000_00ee2000 69632 41 00 [ 00 00 00 00 00 ] 00 LLC
00000000_00ef4000 69632 41 00 [ 00 00 00 00 00 ] 00 LLC
00000000_0374a000 20971520 36 00 [ 00 00 00 00 00 ] 00 X dirty uncached (name: 3) (fence: 18)
render ring --- 3 requests
pid 1869, ban score 0, seqno 4:0001c29e, emitted 207960ms ago, head 0001f648, tail 0001f760
pid 934, ban score 0, seqno 1:0001c2a0, emitted 207924ms ago, head 0001f760, tail 0001f878
pid 934, ban score 0, seqno 1:0001c2a2, emitted 207908ms ago, head 0001f878, tail 0001f8f8
render ring --- 2 waiters
seqno 0x0001c29e for gnome-shell [1869]
seqno 0x0001c2a0 for Xorg [934]
Num Pipes: 3
PWR_WELL_CTL2: c0000000
Pipe [0]:
Power: on
SRC: 077f04af
STAT: 00000000
Plane [0]:
CNTR: d9000400
STRIDE: 00003c00
SURF: 0374a000
TILEOFF: 00000000
Cursor [0]:
CNTR: 05000027
POS: 02740205
BASE: 00f49000
Pipe [1]:
Power: on
SRC: 077f0437
STAT: 00000000
Plane [1]:
CNTR: d9000400
STRIDE: 00003c00
SURF: 03759000
TILEOFF: 00000000
Cursor [1]:
CNTR: 00000000
POS: 00000000
BASE: 00000000
Pipe [2]:
Power: on
SRC: 00000000
STAT: 00000000
Plane [2]:
CNTR: 00000000
STRIDE: 00000000
SURF: 00000000
TILEOFF: 00000000
Cursor [2]:
CNTR: 00000000
POS: 00000000
BASE: 00000000
CPU transcoder: A
Power: on
CONF: c0000000
HTOTAL: 081f077f
HBLANK: 081f077f
HSYNC: 07cf07af
VTOTAL: 04d204af
VBLANK: 04d204af
VSYNC: 04b804b2
CPU transcoder: B
Power: on
CONF: 00000000
HTOTAL: 072f068f
HBLANK: 072f068f
HSYNC: 06df06bf
VTOTAL: 04560437
VBLANK: 04370419
VSYNC: 0422041c
CPU transcoder: C
Power: on
CONF: 00000000
HTOTAL: 00000000
HBLANK: 00000000
HSYNC: 00000000
VTOTAL: 00000000
VBLANK: 00000000
VSYNC: 00000000
CPU transcoder: EDP
Power: on
CONF: c0000000
HTOTAL: 081f077f
HBLANK: 081f077f
HSYNC: 07cf07af
VTOTAL: 04560437
VBLANK: 04560437
VSYNC: 043f043a