Re: [Intel-gfx] [GIT pull] x86 APIC updates for 4.15

From: Maarten Lankhorst
Date: Mon Dec 11 2017 - 12:32:11 EST


Op 11-12-17 om 12:06 schreef Thomas Gleixner:
> On Mon, 11 Dec 2017, Thomas Gleixner wrote:
>
>> On Mon, 11 Dec 2017, Daniel Vetter wrote:
>>> Anything else we can do to move this? I just had to resolve a small
>>> conflict when moving forward to -rc3. Carrying a revert for the entire
>>> apic pull (too many deps to just revert the bisected patch) is a bit
>>> annoying.
>> alpine.DEB.2.20.1712081129450.1840@nanos">https://lkml.kernel.org/r/alpine.DEB.2.20.1712081129450.1840@nanos
> Another question. Is this broken on mainline or just with your new bits
> targeted for next?
>
> Thanks,
>
> tglx

I'm on mainline, I can reproduce the do_IRQ: no irq handler for vector 4.33 during a
normal suspend/resume cycle, but afaict dead irqs only showed up on hibernation fail. Not all
platforms supported on i915 seem affected, though I've sometimes also seen it affect
other drivers.. MEI iirc

Previous attempt with single s/r:

https://lkml.org/lkml/2017/12/6/215

swapoff -a
swapon /swapfile
echo disk > /sys/power/state

root@snb-2600:~# cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
<idle>-0 [000] d.h3 1240.663032: irq_matrix_alloc: bit=33 cpu=1 online=1 avl=202 alloc=1 managed=0 online_maps=8 global_avl=1611, global_rsvd=15, total_alloc=13
<idle>-0 [000] d.h3 1240.663035: vector_update: irq=29 vector=33 cpu=1 prev_vector=42 prev_cpu=0
<idle>-0 [000] d.h3 1240.663036: vector_alloc: irq=29 vector=33 reserved=1 ret=0
<idle>-0 [000] d.h3 1240.663037: vector_config: irq=29 vector=33 cpu=1 apicdest=0x00000002
<idle>-0 [000] d.H3 1240.663095: vector_free_moved: irq=29 cpu=0 vector=42 is_managed=0
<idle>-0 [000] d.H3 1240.663096: irq_matrix_free: bit=42 cpu=0 online=1 avl=196 alloc=7 managed=0 online_maps=8 global_avl=1612, global_rsvd=15, total_alloc=12
kworker/u16:21-1945 [005] d..1 1240.668553: vector_deactivate: irq=27 is_managed=0 can_reserve=1 early=0
kworker/u16:21-1945 [005] d..2 1240.668560: vector_clear: irq=27 vector=40 cpu=0 prev_vector=0 prev_cpu=0
kworker/u16:21-1945 [005] d..2 1240.668561: irq_matrix_free: bit=40 cpu=0 online=1 avl=197 alloc=6 managed=0 online_maps=8 global_avl=1613, global_rsvd=15, total_alloc=11
kworker/u16:21-1945 [005] d..2 1240.668562: irq_matrix_reserve: online_maps=8 global_avl=1613, global_rsvd=16, total_alloc=11
kworker/u16:21-1945 [005] d..2 1240.668562: vector_reserve: irq=27 ret=0
kworker/u16:21-1945 [005] d..2 1240.668563: vector_config: irq=27 vector=239 cpu=0 apicdest=0x00000001
kworker/u16:21-1945 [005] d..1 1240.668874: vector_teardown: irq=27 is_managed=0 has_reserved=1
kworker/u16:21-1945 [005] d..1 1240.668875: irq_matrix_remove_reserved: online_maps=8 global_avl=1613, global_rsvd=15, total_alloc=11
kworker/u16:22-1946 [002] d..1 1240.676865: vector_deactivate: irq=29 is_managed=0 can_reserve=1 early=0
kworker/u16:22-1946 [002] d..2 1240.676867: vector_clear: irq=29 vector=33 cpu=1 prev_vector=0 prev_cpu=0
kworker/u16:22-1946 [002] d..2 1240.676868: irq_matrix_free: bit=33 cpu=1 online=1 avl=203 alloc=0 managed=0 online_maps=8 global_avl=1614, global_rsvd=15, total_alloc=10
kworker/u16:22-1946 [002] d..2 1240.676869: irq_matrix_reserve: online_maps=8 global_avl=1614, global_rsvd=16, total_alloc=10
kworker/u16:22-1946 [002] d..2 1240.676869: vector_reserve: irq=29 ret=0
kworker/u16:22-1946 [002] d..2 1240.676870: vector_config: irq=29 vector=239 cpu=0 apicdest=0x00000001
kworker/u16:22-1946 [002] d..1 1240.676912: vector_teardown: irq=29 is_managed=0 has_reserved=1
kworker/u16:22-1946 [002] d..1 1240.676913: irq_matrix_remove_reserved: online_maps=8 global_avl=1614, global_rsvd=15, total_alloc=10
<idle>-0 [001] d..2 1240.841921: irq_matrix_online: online_maps=2 global_avl=396, global_rsvd=15, total_alloc=15
<idle>-0 [002] d..2 1240.847061: irq_matrix_online: online_maps=3 global_avl=599, global_rsvd=15, total_alloc=15
<idle>-0 [003] d..2 1240.851371: irq_matrix_online: online_maps=4 global_avl=802, global_rsvd=15, total_alloc=15
<idle>-0 [004] d..2 1240.855968: irq_matrix_online: online_maps=5 global_avl=1005, global_rsvd=15, total_alloc=15
<idle>-0 [005] d..2 1240.860557: irq_matrix_online: online_maps=6 global_avl=1208, global_rsvd=15, total_alloc=15
<idle>-0 [006] d..2 1240.865294: irq_matrix_online: online_maps=7 global_avl=1411, global_rsvd=15, total_alloc=15
<idle>-0 [007] d..2 1240.870466: irq_matrix_online: online_maps=8 global_avl=1614, global_rsvd=15, total_alloc=15
kworker/u16:23-1947 [003] d..1 1240.903430: irq_matrix_reserve: online_maps=8 global_avl=1614, global_rsvd=16, total_alloc=15
kworker/u16:23-1947 [003] d..1 1240.903432: vector_reserve: irq=27 ret=0
kworker/u16:23-1947 [003] d..1 1240.903433: vector_config: irq=27 vector=239 cpu=0 apicdest=0x00000001
kworker/u16:23-1947 [003] .... 1240.903434: vector_setup: irq=27 is_legacy=0 ret=0
kworker/u16:23-1947 [003] .... 1240.903458: vector_activate: irq=27 is_managed=0 can_reserve=1 early=1
kworker/u16:23-1947 [003] d..1 1240.903474: vector_config: irq=27 vector=239 cpu=0 apicdest=0x00000001
kworker/u16:5-152 [001] d..1 1240.903484: irq_matrix_reserve: online_maps=8 global_avl=1614, global_rsvd=17, total_alloc=15
kworker/u16:5-152 [001] d..1 1240.903485: vector_reserve: irq=29 ret=0
kworker/u16:5-152 [001] d..1 1240.903486: vector_config: irq=29 vector=239 cpu=0 apicdest=0x00000001
kworker/u16:5-152 [001] .... 1240.903486: vector_setup: irq=29 is_legacy=0 ret=0
kworker/u16:5-152 [001] .... 1240.903499: vector_activate: irq=29 is_managed=0 can_reserve=1 early=1
kworker/u16:5-152 [001] d..1 1240.903499: vector_config: irq=29 vector=239 cpu=0 apicdest=0x00000001
kworker/u16:23-1947 [003] d..1 1240.903638: vector_activate: irq=27 is_managed=0 can_reserve=1 early=0
kworker/u16:23-1947 [003] d..2 1240.903640: irq_matrix_alloc: bit=42 cpu=0 online=1 avl=192 alloc=11 managed=0 online_maps=8 global_avl=1613, global_rsvd=16, total_alloc=16
kworker/u16:23-1947 [003] d..2 1240.903641: vector_update: irq=27 vector=42 cpu=0 prev_vector=0 prev_cpu=0
kworker/u16:23-1947 [003] d..2 1240.903642: vector_alloc: irq=27 vector=42 reserved=1 ret=0
kworker/u16:23-1947 [003] d..2 1240.903643: vector_config: irq=27 vector=42 cpu=0 apicdest=0x00000001
kworker/u16:23-1947 [003] d..3 1240.903695: vector_config: irq=27 vector=42 cpu=0 apicdest=0x00000001
kworker/u16:5-152 [001] d..1 1240.903908: vector_activate: irq=29 is_managed=0 can_reserve=1 early=0
kworker/u16:5-152 [001] d..2 1240.903909: irq_matrix_alloc: bit=43 cpu=0 online=1 avl=191 alloc=12 managed=0 online_maps=8 global_avl=1612, global_rsvd=15, total_alloc=17
kworker/u16:5-152 [001] d..2 1240.903910: vector_update: irq=29 vector=43 cpu=0 prev_vector=0 prev_cpu=0
kworker/u16:5-152 [001] d..2 1240.903910: vector_alloc: irq=29 vector=43 reserved=1 ret=0
kworker/u16:5-152 [001] d..2 1240.903910: vector_config: irq=29 vector=43 cpu=0 apicdest=0x00000001
kworker/u16:5-152 [001] d..3 1240.903934: vector_config: irq=29 vector=43 cpu=0 apicdest=0x00000001

root@snb-2600:~# dmesg
[ 1240.255705] PM: hibernation entry
[ 1240.256442] PM: Syncing filesystems ...
[ 1240.463974] PM: done.
[ 1240.463995] Freezing user space processes ... (elapsed 0.001 seconds) done.
[ 1240.465984] OOM killer disabled.
[ 1240.466102] PM: Marking nosave pages: [mem 0x00000000-0x00000fff]
[ 1240.466110] PM: Marking nosave pages: [mem 0x0009d000-0x000fffff]
[ 1240.466113] PM: Marking nosave pages: [mem 0x20000000-0x201fffff]
[ 1240.466120] PM: Marking nosave pages: [mem 0x40000000-0x401fffff]
[ 1240.466128] PM: Marking nosave pages: [mem 0xc6d51000-0xc6dc9fff]
[ 1240.466131] PM: Marking nosave pages: [mem 0xc6dcb000-0xc6e7ffff]
[ 1240.466135] PM: Marking nosave pages: [mem 0xc7000000-0xffffffff]
[ 1240.466583] PM: Basic memory bitmaps created
[ 1240.466744] PM: Preallocating image memory... done (allocated 240575 pages)
[ 1240.647265] PM: Allocated 962300 kbytes in 0.18 seconds (5346.11 MB/s)
[ 1240.647267] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[ 1240.649623] Suspending console(s) (use no_console_suspend to debug)
[ 1240.696162] ACPI: Preparing to enter system sleep state S4
[ 1240.704807] PM: Saving platform NVS memory
[ 1240.706811] Disabling non-boot CPUs ...
[ 1240.722660] smpboot: CPU 1 is now offline
[ 1240.737691] smpboot: CPU 2 is now offline
[ 1240.753875] smpboot: CPU 3 is now offline
[ 1240.776858] IRQ 28: no longer affine to CPU4
[ 1240.778464] smpboot: CPU 4 is now offline
[ 1240.794329] smpboot: CPU 5 is now offline
[ 1240.806624] IRQ 26: no longer affine to CPU6
[ 1240.807683] smpboot: CPU 6 is now offline
[ 1240.821732] IRQ 23: no longer affine to CPU7
[ 1240.821742] IRQ 30: no longer affine to CPU7
[ 1240.822827] smpboot: CPU 7 is now offline
[ 1240.823732] PM: Creating hibernation image:
[ 1240.913856] PM: Need to copy 238457 pages
[ 1240.913863] PM: Normal pages needed: 238457 + 1024, available pages: 771223
[ 1241.229601] PM: Hibernation image created (238457 pages copied)
[ 1240.825481] PM: Restoring platform NVS memory
[ 1240.826737] Enabling non-boot CPUs ...
[ 1240.826990] x86: Booting SMP configuration:
[ 1240.826995] smpboot: Booting Node 0 Processor 1 APIC 0x2
[ 1240.830906] cache: parent cpu1 should not be sleeping
[ 1240.832065] CPU1 is up
[ 1240.832163] smpboot: Booting Node 0 Processor 2 APIC 0x4
[ 1240.835528] cache: parent cpu2 should not be sleeping
[ 1240.836366] CPU2 is up
[ 1240.836454] smpboot: Booting Node 0 Processor 3 APIC 0x6
[ 1240.839971] cache: parent cpu3 should not be sleeping
[ 1240.840914] CPU3 is up
[ 1240.841011] smpboot: Booting Node 0 Processor 4 APIC 0x1
[ 1240.844512] cache: parent cpu4 should not be sleeping
[ 1240.845559] CPU4 is up
[ 1240.845638] smpboot: Booting Node 0 Processor 5 APIC 0x3
[ 1240.849097] cache: parent cpu5 should not be sleeping
[ 1240.850233] CPU5 is up
[ 1240.850324] smpboot: Booting Node 0 Processor 6 APIC 0x5
[ 1240.854011] cache: parent cpu6 should not be sleeping
[ 1240.855404] CPU6 is up
[ 1240.855497] smpboot: Booting Node 0 Processor 7 APIC 0x7
[ 1240.859168] cache: parent cpu7 should not be sleeping
[ 1240.860793] CPU7 is up
[ 1240.869240] ACPI: Waking up from system sleep state S4
[ 1240.893247] tg3 0000:02:00.0 enp2s0: Link is down
[ 1240.904832] tg3 0000:02:00.0 enp2s0: Link is down
[ 1240.920850] do_IRQ: 4.33 No irq handler for vector
[ 1241.006990] ------------[ cut here ]------------
[ 1241.007008] vblank wait timed out on crtc 0
[ 1241.007022] WARNING: CPU: 0 PID: 1935 at drivers/gpu/drm/drm_vblank.c:1071 drm_wait_one_vblank+0x17e/0x190
[ 1241.007023] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul broadcom crc32_pclmul bcm_phy_lib snd_hda_intel tg3 snd_hda_codec ghash_clmulni_intel ptp snd_hwdep snd_hda_core pps_core snd_pcm mei_me mei lpc_ich prime_numbers
[ 1241.007058] CPU: 0 PID: 1935 Comm: kworker/u16:11 Not tainted 4.15.0-rc3-valkyria #20
[ 1241.007060] Hardware name: Dell Inc. XPS 8300 /0Y2MRG, BIOS A06 10/17/2011
[ 1241.007064] Workqueue: events_unbound async_run_entry_fn
[ 1241.007068] RIP: 0010:drm_wait_one_vblank+0x17e/0x190
[ 1241.007069] RSP: 0018:ffffc90000817a60 EFLAGS: 00010282
[ 1241.007072] RAX: 000000000000001f RBX: ffff88010d1c0000 RCX: 0000000000000006
[ 1241.007074] RDX: 0000000000001400 RSI: ffffffff81d1d5c1 RDI: ffffffff81ccea26
[ 1241.007076] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
[ 1241.007077] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 1241.007079] R13: 0000000000000000 R14: 0000000000012157 R15: ffff8801124812b0
[ 1241.007081] FS: 0000000000000000(0000) GS:ffff88012fa00000(0000) knlGS:0000000000000000
[ 1241.007082] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1241.007084] CR2: 00007f4bc0dc83b0 CR3: 0000000001e10002 CR4: 00000000000606f0
[ 1241.007086] Call Trace:
[ 1241.007090] ? wait_woken+0x90/0x90
[ 1241.007130] ironlake_crtc_enable+0x7a1/0xbe0 [i915]
[ 1241.007162] intel_update_crtc+0x3d/0x90 [i915]
[ 1241.007187] intel_update_crtcs+0x57/0x70 [i915]
[ 1241.007211] intel_atomic_commit_tail+0x2b4/0xc60 [i915]
[ 1241.007231] ? __i915_sw_fence_complete+0x25/0x230 [i915]
[ 1241.007249] ? __i915_sw_fence_complete+0x25/0x230 [i915]
[ 1241.007275] intel_atomic_commit+0x229/0x2b0 [i915]
[ 1241.007281] ? pci_pm_poweroff+0xf0/0xf0
[ 1241.007285] drm_atomic_helper_commit_duplicated_state+0xd9/0x100
[ 1241.007309] __intel_display_resume+0x7b/0xc0 [i915]
[ 1241.007333] intel_display_resume+0xca/0xf0 [i915]
[ 1241.007359] i915_pm_restore+0xf7/0x1a0 [i915]
[ 1241.007363] pci_pm_thaw+0x44/0x80
[ 1241.007367] dpm_run_callback+0x69/0x320
[ 1241.007372] device_resume+0xa8/0x1c0
[ 1241.007377] ? dpm_watchdog_set+0x60/0x60
[ 1241.007384] async_resume+0x19/0x40
[ 1241.007387] async_run_entry_fn+0x33/0x160
[ 1241.007392] process_one_work+0x1fc/0x690
[ 1241.007400] worker_thread+0x4d/0x3e0
[ 1241.007406] kthread+0x13e/0x170
[ 1241.007409] ? process_one_work+0x690/0x690
[ 1241.007411] ? kthread_create_on_node+0x40/0x40
[ 1241.007417] ret_from_fork+0x24/0x30
[ 1241.007428] Code: 0f ff 31 c0 e9 32 ff ff ff 48 89 e6 4c 89 ff e8 89 5b ad ff 45 85 ed 0f 85 23 ff ff ff 89 ee 48 c7 c7 00 62 cf 81 e8 92 82 a8 ff <0f> ff e9 0e ff ff ff 90 66 2e 0f 1f 84 00 00 00 00 00 66 66 66
[ 1241.007510] ---[ end trace f909772e4522cb60 ]---
[ 1241.220289] ata4: SATA link down (SStatus 0 SControl 300)
[ 1241.220523] ata2: SATA link down (SStatus 0 SControl 300)
[ 1241.220542] ata5: SATA link down (SStatus 0 SControl 300)
[ 1241.220590] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1241.220613] ata3: SATA link down (SStatus 0 SControl 300)
[ 1241.236544] ata1.00: configured for UDMA/133
[ 1243.976023] tg3 0000:02:00.0 enp2s0: Link is up at 1000 Mbps, full duplex
[ 1243.976026] tg3 0000:02:00.0 enp2s0: Flow control is off for TX and off for RX
[ 1251.462964] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:37:pipe A] flip_done timed out
[ 1251.634543] PM: Using 3 thread(s) for compression
[ 1251.634546] PM: Compressing and saving image data (238923 pages)...
[ 1251.634560] PM: Image saving progress: 0%
[ 1251.943212] PM: Image saving progress: 10%
[ 1252.251922] PM: Image saving progress: 20%
[ 1252.444829] PM: Image saving progress: 30%
[ 1252.811870] PM: Image saving progress: 40%
[ 1253.103338] PM: Image saving progress: 50%
[ 1253.473545] PM: Image saving progress: 60%
[ 1253.863775] PM: Image saving progress: 70%
[ 1254.276157] PM: Image saving progress: 80%
[ 1254.548270] PM: Image saving progress: 90%
[ 1254.706700] PM: Image saving progress: 100%
[ 1254.706956] PM: Image saving done
[ 1254.706958] PM: Wrote 955692 kbytes in 3.07 seconds (311.30 MB/s)
[ 1254.707762] PM: S
[ 1254.708033] PM: Swap header not found!
[ 1254.708034] |
[ 1254.959172] PM: Basic memory bitmaps freed
[ 1254.959175] OOM killer enabled.
[ 1254.959176] Restarting tasks ... done.
[ 1254.961527] PM: hibernation exit

# cat /sys/kernel/debug/irq/irqs/28
handler: handle_edge_irq
device: 0000:00:02.0
status: 0x00000400
_IRQ_NOPROBE
istate: 0x00000000
ddepth: 0
wdepth: 0
dstate: 0x01401300
IRQD_ACTIVATED
IRQD_IRQ_STARTED
IRQD_SINGLE_TARGET
IRQD_AFFINITY_SET
IRQD_SETAFFINITY_PENDING
node: 0
affinity: 0,5-7
effectiv: 0
pending: 4
domain: PCI-MSI-2
hwirq: 0x8000
chip: PCI-MSI
flags: 0x10
IRQCHIP_SKIP_SET_WAKE
parent:
domain: VECTOR
hwirq: 0x1c
chip: APIC
flags: 0x0
Vector: 36
Target: 0

I don't see anything with irq=28, weird..

~Maarten