Re: [PATCH v2 1/1] psi: stop relying on timer_pending for poll_work rescheduling

From: Suren Baghdasaryan
Date: Fri Jul 02 2021 - 11:50:13 EST


On Fri, Jul 2, 2021 at 2:28 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Thu, Jul 01, 2021 at 09:28:04AM -0700, Suren Baghdasaryan wrote:
> > On Thu, Jul 1, 2021 at 9:12 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > >
> > > On Thu, Jul 01, 2021 at 09:09:25AM -0700, Suren Baghdasaryan wrote:
> > > > On Thu, Jul 1, 2021 at 1:59 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > > > >
> > > > > On Wed, Jun 30, 2021 at 01:51:51PM -0700, Suren Baghdasaryan wrote:
> > > > > > + /* cmpxchg should be called even when !force to set poll_scheduled */
> > > > > > + if (atomic_cmpxchg(&group->poll_scheduled, 0, 1) && !force)
> > > > > > return;
> > > > >
> > > > > Why is that a cmpxchg() ?
> > > >
> > > > We want to set poll_scheduled and proceed with rescheduling the timer
> > > > unless it's already scheduled, so cmpxchg helps us to make that
> > > > decision atomically. Or did I misunderstand your question?
> > >
> > > What's wrong with: atomic_xchg(&group->poll_scheduled, 1) ?
> >
> > Yes, since poll_scheduled can be only 0 or 1 atomic_xchg should work
> > fine here. Functionally equivalent but I assume atomic_xchg() is more
> > efficient due to no comparison.
>
> Mostly conceptually simpler; the cmpxchg-on-0 makes that you have to
> check if there's ever any state outside of {0,1}. The xchg() thing is
> the classical test-and-set pattern.
>
> On top of all that, the cmpxchg() can fail, which brings ordering
> issues.

Oh, I see. That was my mistake. I was wrongly assuming that all RMW
atomic operations are fully ordered but indeed, documentation states
that:
```
- RMW operations that have a return value are fully ordered;
- RMW operations that are conditional are unordered on FAILURE,
otherwise the above rules apply.
```
So that's the actual functional difference here. Thanks for catching
this and educating me!

>
> Typically, I think, you want to ensure that everything that happens
> before psi_schedule_poll_work() is visible to the work when it runs
> (also see Johannes' email).

Correct and I think I understand now the concern Johannes expressed.

> In case poll_scheduled is already 1, the
> cmpxchg will fail and *NOT* provide that ordering. Meaning the work
> might not observe the latest changes. xchg() doesn't have this subtlety.

Got it.
So I think the modifications needed to this patch is:
1. replacing atomic_cmpxchg(&group->poll_scheduled, 0, 1) with
atomic_chg(&group->poll_scheduled, 1)
2. an explicit smp_mb() barrier right after
atomic_set(&group->poll_scheduled, 0) in psi_poll_work().

I think that should ensure the correct ordering here.
If you folks agree I'll respin v3 with these changes (or maybe I
should respin and we continue discussion with that version?).

>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@xxxxxxxxxxx.
>