Re: [RFC PATCH] nvme: prevent hang on surprise removal of NVMe disk

From: Markus Blöchl
Date: Wed Feb 16 2022 - 06:18:52 EST


On Tue, Feb 15, 2022 at 08:17:31PM +0100, Christoph Hellwig wrote:
> On Mon, Feb 14, 2022 at 10:51:07AM +0100, Markus Blöchl wrote:
> > After the surprise removal of a mounted NVMe disk the pciehp task
> > reliably hangs forever with a trace similar to this one:
>
> Do you have a specific reproducer? At least with doing a
>
> echo 1 > /sys/.../remove
>
> while running fsx on a file system I can't actually reproduce it.

We built our own enclosures with a custom connector to plug the disks.

So an external enclosure for thunderbolt is probably very similar.
(or just ripping an unscrewed NVMe out of the M.2 ...)

But as already suggested, qemu might also be very useful here as it also
allows us to test multiple namespaces and multipath I/O, if you/someone
wants to check those too (hotplug with multipath I/O really scares me).