Re: SCSI device probing non-deterministic in 5.3

From: Bart Van Assche
Date: Fri Oct 04 2019 - 11:39:06 EST


On 10/3/19 2:19 PM, Randy Dunlap wrote:
[add linux-scsi mailing list]

On 10/3/19 1:32 PM, Bradley LaBoon wrote:
Hello, LKML!

Beginning with kernel 5.3 the order in which SCSI devices are probed and
named has become non-deterministic. This is a result of a patch that was
submitted to add asynchronous device probing (specifically, commit
f049cf1a7b6737c75884247c3f6383ef104d255a). Previously, devices would
always be probed in the order in which they exist on the bus, resulting
in the first device being named 'sda', the second device 'sdb', and so on.

This is important in the case of mass VM deployments where many VMs are
created from a single base image. Partition UUIDs cannot be used in the
fstab of such an image because the UUIDs will be different for each VM
and are not known in advance. Normally you can't rely on device names
being consistent between boots, but with QEMU you can set the bus order
of each block device and thus we currently use that to control the
device order in the guest. With the introduction of the aforementioned
patch this is no longer possible and the device ordering is different on
every boot, resulting in the guest booting into an emergency shell
unless the devices randomly happen to be loaded in the expected order.

I have created a patch which reverts back to the previous behavior, but
I wanted to open this topic to discussion before posting it. I'm not
totally familiar with the low-level details of SCSI device probing, so I
don't know if the non-deterministic device order was the intended
behavior of the patch or just a side-effect. If that is the intended
behavior then is there perhaps some other way to ensure a consistent
device ordering for a guest VM?

Have you already had a look at the /dev/disk/by-path directory? An example of the contents of that directory:

$ (cd /dev/disk/by-path && ls -l | grep /s)
lrwxrwxrwx 1 root root 9 Oct 3 16:49 pci-0000:00:02.0-ata-1 -> ../../sda
lrwxrwxrwx 1 root root 9 Oct 3 16:49 pci-0000:00:08.0-scsi-0:0:0:1 -> ../../sr0

Have you considered to use these soft links in /etc/fstab?

In case using these links would be impractical: have you considered to add a udev rule that creates H:C:I:L soft links under a subdirectory of /dev, that makes these soft links point at the /dev/sd* device nodes and to use these soft links in /etc/fstab? That's probably a much more elegant solution than what has been proposed above.

As one can see the information that is needed to implement such a udev rule is already available in sysfs:

$ (cd /sys/class/scsi_device && ls -ld */device/block/*)
drwxr-xr-x 9 root root 0 Oct 3 16:48 2:0:0:1/device/block/sr0
drwxr-xr-x 9 root root 0 Oct 3 16:48 3:0:0:0/device/block/sda

Bart.