Re: [PATCH] iommu/vt-d: Check for non-NULL domain on device release

From: Robin Murphy
Date: Tue Jan 16 2024 - 20:57:50 EST


On 2024-01-16 3:22 pm, Jason Gunthorpe wrote:
On Sat, Jan 13, 2024 at 10:17:13AM -0800, Eric Badger wrote:
In the kdump kernel, the IOMMU will operate in deferred_attach mode. In
this mode, info->domain may not yet be assigned by the time the
release_device function is called, which leads to the following crash in
the crashkernel:

This never worked right? But why are you getting to release in a crash
kernel in the first place, that seems kinda weird..

BUG: kernel NULL pointer dereference, address: 000000000000003c
...
RIP: 0010:do_raw_spin_lock+0xa/0xa0
...
_raw_spin_lock_irqsave+0x1b/0x30
intel_iommu_release_device+0x96/0x170
iommu_deinit_device+0x39/0xf0
__iommu_group_remove_device+0xa0/0xd0
iommu_bus_notifier+0x55/0xb0
notifier_call_chain+0x5a/0xd0
blocking_notifier_call_chain+0x41/0x60
bus_notify+0x34/0x50
device_del+0x269/0x3d0
pci_remove_bus_device+0x77/0x100
p2sb_bar+0xae/0x1d0
...
i801_probe+0x423/0x740

Signed-off-by: Eric Badger <ebadger@xxxxxxxxxxxxxxx>

It should have a fixes line, but I'm not sure what it is..

---
drivers/iommu/intel/iommu.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

Unfortunately this issue is likely systemic in all the drivers.

All two of the drivers which support deferred attach, that is.

Something I've been thinking about for a while now is to have the
option for the core code release to set the driver to a specific
releasing domain (probably a blocking or identity global static)

A lot of drivers are open coding this internally in their release
functions, like Intel. But it really doesn't deserve to be special.

Either way it shouldn't apply in this case, though. Crash kernels *are* special. The whole point of deferred attach is that we don't disturb anything unless we've got as far as definitely using a given default domain (from which we can infer the relevant client device should have been reset and quiesced). Thus regardless of why release might get called, if a deferred attach never happened then the release should still avoid touching any hardware configuration either.

I'd suggest using dev->iommu->attach_deferred as the condition rather than a NULL domain, to be clear that it's the *only* valid reason to not have a domain at this point.

Thanks,
Robin.