Re: [PATCH 3/3] xen-blkfront: dynamic configuration of per-vbd resources

From: Roger Pau Monné
Date: Mon Jul 25 2016 - 06:53:35 EST


On Mon, Jul 25, 2016 at 06:29:22PM +0800, Bob Liu wrote:
>
> On 07/25/2016 05:20 PM, Roger Pau Monné wrote:
> > On Sat, Jul 23, 2016 at 06:18:23AM +0800, Bob Liu wrote:
> >>
> >> On 07/22/2016 07:45 PM, Roger Pau Monné wrote:
> >>> On Fri, Jul 22, 2016 at 05:43:32PM +0800, Bob Liu wrote:
> >>>>
> >>>> On 07/22/2016 05:34 PM, Roger Pau Monné wrote:
> >>>>> On Fri, Jul 22, 2016 at 04:17:48PM +0800, Bob Liu wrote:
> >>>>>>
> >>>>>> On 07/22/2016 03:45 PM, Roger Pau Monné wrote:
> >>>>>>> On Thu, Jul 21, 2016 at 06:08:05PM +0800, Bob Liu wrote:
> >>>>>>>>
> >>>>>>>> On 07/21/2016 04:57 PM, Roger Pau Monné wrote:
> >>>>>>>>>> + blk_mq_freeze_queue(info->rq);
> >>>>>>>>>> + if (part_in_flight(&info->gd->part0))
> >>>>>>>>>> + goto out;
> >>>>>>>>>> +
> >>>>>>>>>> + /*
> >>>>>>>>>> + * Front Backend
> >>>>>>>>>> + * Switch to XenbusStateClosed
> >>>>>>>>>> + * frontend_changed():
> >>>>>>>>>> + * case XenbusStateClosed:
> >>>>>>>>>> + * xen_blkif_disconnect()
> >>>>>>>>>> + * Switch to XenbusStateClosed
> >>>>>>>>>> + * blkfront_resume():
> >>>>>>>>>> + * frontend_changed():
> >>>>>>>>>> + * reconnect
> >>>>>>>>>> + * Wait until XenbusStateConnected
> >>>>>>>>>> + */
> >>>>>>>>>> + info->reconfiguring = true;
> >>>>>>>>>> + xenbus_switch_state(info->xbdev, XenbusStateClosed);
> >>>>>>>>>> +
> >>>>>>>>>> + /* Poll every 100ms, 1 minute timeout. */
> >>>>>>>>>> + for (i = 0; i < 600; i++) {
> >>>>>>>>>> + /*
> >>>>>>>>>> + * Wait backend enter XenbusStateClosed, blkback_changed()
> >>>>>>>>>> + * will clear reconfiguring.
> >>>>>>>>>> + */
> >>>>>>>>>> + if (!info->reconfiguring)
> >>>>>>>>>> + goto resume;
> >>>>>>>>>> + schedule_timeout_interruptible(msecs_to_jiffies(100));
> >>>>>>>>>> + }
> >>>>>>>>>
> >>>>>>>>> Instead of having this wait, could you just set info->reconfiguring = 1, set
> >>>>>>>>> the frontend state to XenbusStateClosed and mimic exactly what a resume from
> >>>>>>>>> suspension does? blkback_changed would have to set the frontend state to
> >>>>>>>>> InitWait when it detects that the backend has switched to Closed, and call
> >>>>>>>>> blkfront_resume.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I think that won't work.
> >>>>>>>> In the real "resume" case, the power management system will trigger all ->resume() path.
> >>>>>>>> But there is no place for dynamic configuration.
> >>>>>>>
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> I think it should be possible to set info->reconfiguring and wait for the
> >>>>>>> backend to switch to state Closed, at that point we should call blkif_resume
> >>>>>>> (from blkback_changed) and the backend will follow the reconection.
> >>>>>>>
> >>>>>>
> >>>>>> Okay, I get your point. Yes, that's an option.
> >>>>>>
> >>>>>> But this will make 'dynamic configuration' to be async, I'm worry about the end-user will get panic.
> >>>>>> E.g
> >>>>>> A end-user "echo <new value> > /sys/devices/vbd-xxx/max_indirect_segs",
> >>>>>> but then the device will be Closed and disappeared, the user have to wait for a random time so that the device can resume.
> >>>>>
> >>>>> That should not happen, AFAICT on migration the device never dissapears.
> >>>>
> >>>> Oh, yes.
> >>>>
> >>>>> alloc_disk and friends should not be called on resume from migration (see
> >>>>> the switch in blkfront_connect, you should take the BLKIF_STATE_SUSPENDED
> >>>>> path for the reconfiguration).
> >>>>>
> >>>>
> >>>> What about if the end-user starts I/O immediately after writing new value to /sys?
> >>>> But the resume is still in progress.
> >>>
> >>> blkif_free already stops the queues by calling blk_mq_stop_hw_queues, and
> >>> blkif_queue_request will refuse to put anything on the ring if the state
> >>> is different than connected, which in turn makes blkif_queue_rq call
> >>> blk_mq_stop_hw_queue to stop the queue, so it should be safe.
> >>>
> >>
> >> But this will surprise the end-user, our user script is like this:
> >> 1) echo <new value> > /sys/xxxx
> >> 2) Start I/O immediately.
> >> ^^^ Fail because requests would be refused(even software queue was still freezed).
> >
> > Which error do you get? AFAICT reads/writes should either block when the
>
> You are right, the reads/writes will just block which is fine.
>
> > queue is full, or if the fd is in non-blocking mode it should return EAGAIN
> > (which a properly coded application should be prepared to deal with
> > gracefully if it's using non-blocking fds anyway).
> >
> >> It's not good for the end user have to wait for a random time before restart I/O.
> >>
> >>
> >> There are two more concerns I have:
> >> * blkif_resume() may fail, how the end-user can aware that if "echo <new value> > /sys/xxx" already returned success.
> >
> > If you really think this is needed, can't you use a monitor or some kind of
> > condition with a timeout instead of open-coding it? Although I'm still not
> > convinced that blocking here is TRTTD.
> >
>
> Let me ask in another way.
> If moving blkfront_resume() to blkback_changed, should we check the return value of blkfront_resume()?
> And what to do if it returns error.

If it returns an error you should close the device. If you want a more
advanced solution, you could fall back to the previously working parameters
and try to reconnect again, but TBH, I doubt that this is going to fix
whatever issue caused the first error.

> >> * We get the device lock and blk_mq_freeze_queue() in dynamic_reconfig_device(),
> >> and then have to release in blkif_recover() asynchronously which makes the code more difficult to readable.
> >
> > I'm not sure I'm following here, do you mean that you will pick the lock in
> > dynamic_reconfig_device and release it in blkif_recover? Why wouldn't you
>
> Yes.
>
> > release the lock in dynamic_reconfig_device after doing whatever is needed?
> >
>
> Both 'dynamic configuration' and migration:xenbus_dev_resume() use { blkfront_resume(); blkif_recover() } and depends on the change of xbdev->state.
> If they happen simultaneously, the State machine of xbdev->state is going to be a mess and very difficult to track.
>
> The lock(actually a mutex) is like a big lock to make sure no race would happen at all.

So the only thing that you should do is set the frontend state to closed and
wait for the backend to also switch to state closed, and then switch the
frontend state to init again in order to trigger a reconnection.

You are right that all this process depends on the state being updated
correctly, but I don't see how's that different from a normal connection or
resume, and we don't seem to have races there.

Roger.