Re: [REGRESSION] Failed network caused by: xhci: switch to pci_alloc_irq_vectors

From: Greg Kroah-Hartman
Date: Fri May 19 2017 - 01:42:40 EST


On Thu, May 18, 2017 at 11:42:34PM -0400, Steven Rostedt wrote:
>
> One of my the configs I use to test ftrace with (configs that have
> caused failures in the past), has lots of irq issues and fails to
> initialize the network of my box. I bisected the problem down to a
> single commit, and when I revert that commit, my box boots without any
> network or irq issues.
>
> Note, my other configs work fine on this box. I haven't investigated
> which config is also the culprit. But since it use to work with this
> config, I want to report it.

So what commit is causing the problem?

It looks like the ehci driver is having problems, but first, your
interrupts are whack:

> irq 16: nobody cared (try booting with the "irqpoll" option)
> CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.12.0-rc1-test-dirty #24
> Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
> Call Trace:
> <IRQ>
> devtmpfs: mounted
> dump_stack+0x9a/0xd6
> __report_bad_irq+0x35/0xc0
> note_interrupt+0x234/0x270
> handle_irq_event_percpu+0x45/0x60
> handle_irq_event+0x39/0x60
> handle_fasteoi_irq+0x8f/0x160
> handle_irq+0x6f/0x110
> do_IRQ+0x46/0xd0
> common_interrupt+0x93/0x93
> RIP: 0010:native_safe_halt+0x6/0x10
> RSP: 0000:ffffb54240cd7e90 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff7e
> RAX: 0000000000000000 RBX: ffff8ea214498040 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffffb54240cd7e90 R08: 0000000000000001 R09: 0000000041129b0c
> R10: ffffb54240cd7d68 R11: 0000000000000001 R12: 0000000000000002
> R13: ffff8ea214498040 R14: 0000000000000000 R15: ffff8ea214498040
> </IRQ>
> default_idle+0x38/0x160
> arch_cpu_idle+0xf/0x20
> default_idle_call+0x28/0x50
> do_idle+0x182/0x220
> cpu_startup_entry+0x1d/0x20
> start_secondary+0x132/0x160
> secondary_startup_64+0x9f/0x9f
> handlers:
> [<ffffffff9a6421a0>] xhci_msi_irq
> Disabling IRQ #16

Have you tried taking the kernel's advice? :)

> ehci-pci 0000:00:1a.0: new USB bus registered, assigned bus number 3
> ehci-pci 0000:00:1a.0: debug port 2
> ehci-pci 0000:00:1a.0: cache line size of 64 is not supported
> genirq: Flags mismatch irq 16. 00000080 (ehci_hcd:usb3) vs. 00000000 (xhci_hcd)

What does that mean?

> CPU: 0 PID: 307 Comm: modprobe Tainted: G E 4.12.0-rc1-test-dirty #24
> Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
> Call Trace:
> dump_stack+0x9a/0xd6
> __setup_irq+0x5d4/0x630
> request_threaded_irq+0x10d/0x190
> usb_add_hcd+0x658/0x970
> ? for_each_companion+0x3e/0xb0
> usb_hcd_pci_probe+0x3e4/0x490
> ehci_pci_probe+0x36/0x40 [ehci_pci]
> local_pci_probe+0x45/0xa0
> ? pci_match_device+0xca/0x110
> pci_device_probe+0xdb/0x130
> driver_probe_device+0x2ed/0x480
> __driver_attach+0xd5/0x100
> ? driver_probe_device+0x480/0x480
> bus_for_each_dev+0x62/0xa0
> driver_attach+0x1e/0x20
> bus_add_driver+0x1c6/0x290
> driver_register+0x60/0xe0
> __pci_register_driver+0x60/0x70
> ? 0xffffffffc0346000
> ehci_pci_init+0x6a/0x1000 [ehci_pci]
> do_one_initcall+0x43/0x190
> ? kmem_cache_alloc_trace+0x1be/0x200
> do_init_module+0x7d/0x210
> load_module+0x1891/0x1eb0
> ? vmap_page_range_noflush+0x29b/0x370
> ? show_coresize+0x30/0x30
> SYSC_init_module+0x143/0x180
> ? load_module+0x5/0x1eb0
> ? SYSC_init_module+0x143/0x180
> SyS_init_module+0xe/0x10
> entry_SYSCALL_64_fastpath+0x23/0xc2
> RIP: 0033:0x3b918e0ffa
> RSP: 002b:00007ffd11d575c8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
> RAX: ffffffffffffffda RBX: 000000000061f950 RCX: 0000003b918e0ffa
> RDX: 000000000061f7d0 RSI: 00000000000036b0 RDI: 000000000062c9e0
> RBP: 0000000000000000 R08: 0000000000630090 R09: 00007f019c07c700
> R10: 00007ffd11d574f0 R11: 0000000000000246 R12: 0000000000626200
> R13: 000000000061f930 R14: 0000000000000000 R15: 000000000061f420
> ehci-pci 0000:00:1a.0: request interrupt 16 failed

So ehci can't use the same irq line as xhci? No sharing allowed?

But other configs on this same hardware work, can you do a diff of a
working vs. not working?

thanks,

greg k-h