RE: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

From: Durrant, Paul
Date: Mon Dec 09 2019 - 07:40:57 EST


> -----Original Message-----
> From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> Sent: 09 December 2019 12:26
> To: Durrant, Paul <pdurrant@xxxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxxx; Juergen
> Gross <jgross@xxxxxxxx>; Stefano Stabellini <sstabellini@xxxxxxxxxx>;
> Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
> closed
>
> On Mon, Dec 09, 2019 at 12:01:38PM +0000, Durrant, Paul wrote:
> > > -----Original Message-----
> > > From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > > Sent: 09 December 2019 11:39
> > > To: Durrant, Paul <pdurrant@xxxxxxxxxx>
> > > Cc: linux-kernel@xxxxxxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxxx;
> Juergen
> > > Gross <jgross@xxxxxxxx>; Stefano Stabellini <sstabellini@xxxxxxxxxx>;
> > > Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
> > > Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
> forced to
> > > closed
> > >
> > > On Thu, Dec 05, 2019 at 02:01:21PM +0000, Paul Durrant wrote:
> > > > Only force state to closed in the case when the toolstack may need
> to
> > > > clean up. This can be detected by checking whether the state in
> xenstore
> > > > has been set to closing prior to device removal.
> > >
> > > I'm not sure I see the point of this, I would expect that a failure to
> > > probe or the removal of the device would leave the xenbus state as
> > > closed, which is consistent with the actual driver state.
> > >
> > > Can you explain what's the benefit of leaving a device without a
> > > driver in such unknown state?
> > >
> >
> > If probe fails then I think it should leave the state alone. If the
> > state is moved to closed then basically you just killed that
> > connection to the guest (as the frontend will normally close down
> > when it sees this change) so, if the probe failure was due to a bug
> > in blkback or, e.g., a transient resource issue then it's game over
> > as far as that guest goes.
>
> But the connection can be restarted by switching the backend to the
> init state again.

Too late. The frontend saw closed and you already lost.

>
> > The ultimate goal here is PV backend re-load that is completely
> transparent to the guest. Modifying anything in xenstore compromises that
> so we need to be careful.
>
> That's a fine goal, but not switching to closed state in
> xenbus_dev_remove seems wrong, as you have actually left the frontend
> without a matching backend and with the state not set to closed.
>

Why is this a problem? With this series fully applied a (block) backend can come and go without needing to change the state. Relying on guests to DTRT is not a sustainable option for a cloud deployment.

> Ie: that would be fine if you explicitly state this is some kind of
> internal blkback reload, but not for the general case where blkback
> has been unbound. I think we need someway to difference a blkback
> reload vs a unbound.
>

Why do we need that though? Why is it advantageous for a backend to go to closed. No PV backends cope with an unbind as-is, and a toolstack initiated unplug will always set state to 5 anyway. So TBH any state transition done directly in the xenbus code looks wrong to me anyway (but appears to be a necessary evil to keep the toolstack working in the event it spawns a backend where there is actually to driver present, or it doesn't come online).

Paul