Re: [PATCH v2 1/1] psi: stop relying on timer_pending for poll_work rescheduling

From: Suren Baghdasaryan
Date: Thu Jul 01 2021 - 13:46:14 EST


On Thu, Jul 1, 2021 at 9:39 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
>
> On Thu, Jul 01, 2021 at 10:58:24AM +0200, Peter Zijlstra wrote:
> > On Wed, Jun 30, 2021 at 01:51:51PM -0700, Suren Baghdasaryan wrote:
> > > + /* cmpxchg should be called even when !force to set poll_scheduled */
> > > + if (atomic_cmpxchg(&group->poll_scheduled, 0, 1) && !force)
> > > return;
> >
> > Why is that a cmpxchg() ?
>
> I now realize you had already pointed that out, but I dismissed it in
> the context of poll_lock not being always taken after all.
>
> But you're right, cmpxchg indeed seems inappropriate. xchg will do
> just fine for this binary toggle.
>
> When it comes to ordering, looking at it again, I think we actually
> need ordering here that the seqcount doesn't provide. We have:
>
> timer:
> scheduled = 0
> smp_rmb()
> x = state
>
> scheduler:
> state = y
> smp_wmb()
> if xchg(scheduled, 1) == 0
> mod_timer()
>
> Again, the requirement is that when the scheduler sees the timer as
> already or still pending, the timer must observe its state updates -
> otherwise we miss poll events.
>
> The seqcount provides the wmb and rmb, but the scheduler-side read of
> @scheduled mustn't be reordered before the write to @state. Likewise,
> the timer-side read of @state also mustn't occur before the write to
> @scheduled.
>
> AFAICS this is broken, not just in the patch, but also in the current
> code when timer_pending() on the scheduler side gets reordered. (Not
> sure if timer reading state can be reordered before the detach_timer()
> of its own expiration, but I don't see full ordering between them.)
>
> So it seems to me we need the ordered atomic_xchg() on the scheduler
> side, and on the timer side an smp_mb() after we set scheduled to 0.

Thanks for the analysis Johannes. Let me dwell on it a bit.