RE: [RFC PATCH 5/5] nvme-vfio: Add a document for the NVMe device

From: Dong, Eddie
Date: Tue Dec 06 2022 - 13:00:55 EST




> -----Original Message-----
> From: Christoph Hellwig <hch@xxxxxx>
> Sent: Tuesday, December 6, 2022 7:36 AM
> To: Jason Gunthorpe <jgg@xxxxxxxx>
> Cc: Christoph Hellwig <hch@xxxxxx>; Rao, Lei <Lei.Rao@xxxxxxxxx>;
> kbusch@xxxxxxxxxx; axboe@xxxxxx; kch@xxxxxxxxxx; sagi@xxxxxxxxxxx;
> alex.williamson@xxxxxxxxxx; cohuck@xxxxxxxxxx; yishaih@xxxxxxxxxx;
> shameerali.kolothum.thodi@xxxxxxxxxx; Tian, Kevin <kevin.tian@xxxxxxxxx>;
> mjrosato@xxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-
> nvme@xxxxxxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; Dong, Eddie
> <eddie.dong@xxxxxxxxx>; Li, Yadong <yadong.li@xxxxxxxxx>; Liu, Yi L
> <yi.l.liu@xxxxxxxxx>; Wilk, Konrad <konrad.wilk@xxxxxxxxxx>;
> stephen@xxxxxxxxxxxxx; Yuan, Hang <hang.yuan@xxxxxxxxx>
> Subject: Re: [RFC PATCH 5/5] nvme-vfio: Add a document for the NVMe device
>
> On Tue, Dec 06, 2022 at 11:28:12AM -0400, Jason Gunthorpe wrote:
> > I'm interested as well, my mental model goes as far as mlx5 and
> > hisillicon, so if nvme prevents the VFs from being contained units, it
> > is a really big deviation from VFIO's migration design..
>
> In NVMe the controller (which maps to a PCIe physical or virtual
> function) is unfortunately not very self contained. A lot of state is subsystem-
> wide, where the subsystem is, roughly speaking, the container for all
> controllers that shared storage. That is the right thing to do for say dual
> ported SSDs that are used for clustering or multi-pathing, for tentant isolation
> is it about as wrong as it gets.


NVMe spec is general, but the implementation details (such as internal state) may
be vendor specific. If the migration happens between 2 identical NVMe devices
(from same vendor/device w/ same firmware version), migration of
subsystem-wide state can be naturally covered, right?

>
> There is nothing in the NVMe spec that prohibits your from implementing
> multiple subsystems for multiple functions of a PCIe device, but if you do that
> there is absolutely no support in the spec to manage shared resources or any
> other interaction between them.

In IPU/DPU area, it seems multiple VFs with SR-IOV is widely adopted.

In VFs, the usage of shared resource can be viewed as implementation specific,
and load/save state of a VF can rely on the hardware/firmware itself.
Migration of NVMe devices crossing vendor/device is another story: it may
be useful, but brings additional challenges.