[BUG] NULL-Pointer crash in mutex_can_spin_on_owner

From: Frank Wunderlich
Date: Thu Nov 28 2019 - 06:47:50 EST


Hi,

i got a crash in mutex_can_spin_on_owner while trying to get mPCIe card (ath10 driver) running in bananapi r64 (aarch64) using kernel 5.4

[ 11.381653] DEBUG: Passed mtk_pcie_irq_domain_alloc 441 port:0x00000000bc0b7874 lock:0x0000000023f7fc2a
[ 11.391943] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000260
[ 11.402178] Mem abort info:
[ 11.406356] ESR = 0x96000005
[ 11.410868] EC = 0x25: DABT (current EL), IL = 32 bits
[ 11.417530] SET = 0, FnV = 0
[ 11.421997] EA = 0, S1PTW = 0
[ 11.425220] Data abort info:
[ 11.428196] ISV = 0, ISS = 0x00000005
[ 11.428199] CM = 0, WnR = 0
[ 11.428203] user pgtable: 4k pages, 39-bit VAs, pgdp=000000007c24b000
[ 11.428206] [0000000000000260] pgd=0000000000000000, pud=0000000000000000
[ 11.428214] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[ 11.428218] Modules linked in: ath10k_pci(+) ath10k_core mt7622(+) mt76 ath mac80211 libarc4 cfg80211 btmtkuart
034-ga5669c8a8027-dirty #32
[ 11.428243] Hardware name: Bananapi BPI-R64 (DT)
[ 11.428248] pstate: 00000005 (nzcv daif -PAN -UAO)
[ 11.428260] pc : mutex_can_spin_on_owner+0x30/0x5c
[ 11.428264] lr : mutex_can_spin_on_owner+0x24/0x5c
[ 11.428268] sp : ffffffc010c3b580
[ 11.428271] x29: ffffffc010c3b580 x28: 0000000000080000
[ 11.428275] x27: ffffffc010c3b838 x26: 0000000000000001
[ 11.428280] x25: 000000000000008d x24: 0000000000000002
[ 11.428284] x23: ffffff803d4e6800 x22: ffffff803e155500
[ 11.428288] x21: 0000000000000001 x20: ffffffc010808000
[ 11.428293] x19: ffffff803e155500 x18: 0000000000000000
[ 11.428297] x17: 0000000000000000 x16: 0000000000000000
[ 11.428302] x15: 0000000000000000 x14: 0000000000000000
[ 11.428306] x13: 0000000000000000 x12: 0000000000000001
[ 11.428310] x11: 0000000000000001 x10: 00000000000007c0
[ 11.428315] x9 : ffffffc010c3b3c0 x8 : ffffff803fdb07d0
[ 11.428319] x7 : ffffff803e155500 x6 : ffffff800336a400
[ 11.428324] x5 : 0000000000000220 x4 : 0000000000000220
[ 11.428328] x3 : ffffffc0108b7000 x2 : ffffff800336a400
[ 11.428332] x1 : ffffff800336a400 x0 : 0000000000000220
[ 11.428337] Call trace:
[ 11.428342] mutex_can_spin_on_owner+0x30/0x5c
[ 11.428349] __mutex_lock.isra.9+0x58/0x2a4
[ 11.428356] __mutex_lock_slowpath+0x10/0x18
[ 11.428360] mutex_lock+0x44/0x68
[ 11.428366] mtk_pcie_irq_domain_alloc+0x60/0x120
[ 11.428371] irq_domain_alloc_irqs_hierarchy+0x14/0x1c
[ 11.428375] irq_domain_alloc_irqs_parent+0x14/0x24
[ 11.428379] msi_domain_alloc+0x90/0x130
[ 11.428382] irq_domain_alloc_irqs_hierarchy+0x14/0x1c
[ 11.428386] __irq_domain_alloc_irqs+0x140/0x2b4
[ 11.428390] msi_domain_alloc_irqs+0x134/0x2c4
[ 11.428394] pci_msi_setup_msi_irqs+0x28/0x38
[ 11.428398] __pci_enable_msi_range+0x208/0x30c
[ 11.428401] pci_enable_msi+0x18/0x28
[ 11.428418] ath10k_pci_probe+0x50c/0x6d8 [ath10k_pci]
[ 11.428423] pci_device_probe+0xb4/0x144
[ 11.428430] really_probe+0x238/0x3f8
[ 11.428436] driver_probe_device+0x114/0x124
[ 11.428440] device_driver_attach+0x40/0x68
[ 11.428444] __driver_attach+0x134/0x138
[ 11.428448] bus_for_each_dev+0x78/0xbc

(gdb) list *(mtk_pcie_irq_domain_alloc+0x60)
0xffffffc0102e68d0 is in mtk_pcie_irq_domain_alloc (drivers/pci/controller/pcie-mediatek.c:446).
441 printk(KERN_ALERT "DEBUG: Passed %s %d port:0x%p lock:0x%p\n",__FUNCTION__,__LINE__,(void *)port,(void *)&port->lock);
442
443 WARN_ON(nr_irqs != 1);
444 mutex_lock(&port->lock);
445
446 printk(KERN_ALERT "DEBUG: Passed %s %d\n",__FUNCTION__,__LINE__);
447 bit = find_first_zero_bit(port->msi_irq_in_use, MTK_MSI_IRQS_NUM);
448 printk(KERN_ALERT "DEBUG: Passed %s %d bit:%lu,max:%u\n",__FUNCTION__,__LINE__,bit,MTK_MSI_IRQS_NUM);
449
450 if (bit >= MTK_MSI_IRQS_NUM) {

(gdb) list *(mutex_can_spin_on_owner+0x30)
0xffffffc0100d96f8 is in mutex_can_spin_on_owner (kernel/locking/mutex.c:605).
600 /*
601 * As lock holder preemption issue, we both skip spinning if task is not
602 * on cpu or its cpu is preempted
603 */
604 if (owner)
605 retval = owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
606 rcu_read_unlock();
607
608 /*
609 * If lock->owner is not set, the mutex has been released. Return true

mutex is initialized in mtk_pcie_allocate_msi_domains which is called before twice, so i guess the mutex itself is correctly initialized.

can you give me some hint to debug it?

regards Frank