Re: [RFC PATCH 02/11] Drivers: hv: vmbus: Don't bind the offer&rescind works to a specific CPU

From: Andrea Parri
Date: Fri Apr 03 2020 - 08:05:09 EST


On Mon, Mar 30, 2020 at 02:24:16PM +0200, Vitaly Kuznetsov wrote:
> Michael Kelley <mikelley@xxxxxxxxxxxxx> writes:
>
> > From: Andrea Parri <parri.andrea@xxxxxxxxx> Sent: Saturday, March 28, 2020 10:09 AM
> >>
> >> > In case we believe that OFFER -> RESCINF sequence is always ordered
> >> > by the host AND we don't care about other offers in the queue the
> >> > suggested locking is OK: we're guaranteed to process RESCIND after we
> >> > finished processing OFFER for the same channel. However, waiting for
> >> > 'offer_in_progress == 0' looks fishy so I'd suggest we at least add a
> >> > comment explaining that the wait is only needed to serialize us with
> >> > possible OFFER for the same channel - and nothing else. I'd personally
> >> > still slightly prefer the algorythm I suggested as it guarantees we take
> >> > channel_mutex with offer_in_progress == 0 -- even if there are no issues
> >> > we can think of today (not strongly though).
> >>
> >> Does it? offer_in_progress is incremented without channel_mutex...
> >>
>
> No, it does not, you're right, by itself the change is insufficient.
>
> >> IAC, I have no objections to apply the changes you suggested. To avoid
> >> misunderstandings: vmbus_bus_suspend() presents a similar usage... Are
> >> you suggesting that I apply similar changes there?
> >>
> >> Alternatively: FWIW, the comment in vmbus_onoffer_rescind() does refer
> >> to "The offer msg and the corresponding rescind msg...". I am all ears
> >> if you have any concrete suggestions to improve these comments.
> >>
> >
> > Given that waiting for 'offer_in_progress == 0' is the current code, I think
> > there's an argument to made for not changing it if the change isn't strictly
> > necessary. This patch set introduces enough change that *is* necessary. :-)
> >
>
> Sure. I was thinking a bit more about this and it seems that over years
> we've made the synchronization of channels code too complex (every time
> for a good reason but still). Now (before this series) we have at least:
>
> vmbus_connection.channel_mutex
> vmbus_connection.offer_in_progress
> channel.probe_done
> channel.rescind
> Workqueues (vmbus_connection.work_queue,
> queue_work_on(vmbus_connection.connect_cpu),...)
> channel.lock spinlock (the least of the problems)
>
> Maybe there's room for improvement? Out of top of my head I'd suggest a
> state machine for each channel (e.g something like
> OFFERED->OPENING->OPEN->RESCIND_REQ->RESCINDED->CLOSED) + refcounting
> (subchannels, open/rescind/... requests in progress, ...) + non-blocking
> request handling like "Can we handle this rescind offer now? No,
> refcount is too big. OK, rescheduling the work". Maybe not the best
> design ever and I'd gladly support any other which improves the
> readability of the code and makes all state changes and synchronization
> between them more obvious.
>
> Note, VMBus channel handling driven my messages (unlike events for ring
> buffer) is not performance critical, we just need to ensure completeness
> (all requests are handled correctly) with forward progress guarantees
> (no deadlocks).
>
> I understand the absence of 'hot' issues in the current code is what can
> make the virtue of redesign questionable and sorry for hijacking the
> series which doesn't seem to make things worse :-)

(Back here... Sorry for the delay.)

FWIW, what you wrote above makes sense to me; and *yes*, the series in
question was not intended as "let us undertake such a redesign" (quite
the opposite in fact...)

With regard to this specific patch, it seems to me that we've reached
a certain consensus or, at least, I don't see complaints ;-). Please
let me know if I misunderstood.

Thanks,
Andrea