Re: [RFC PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking

From: Will Deacon
Date: Tue Feb 26 2019 - 13:26:33 EST


Hi Linus,

Thanks for having a look.

On Fri, Feb 22, 2019 at 01:49:32PM -0800, Linus Torvalds wrote:
> On Fri, Feb 22, 2019 at 10:50 AM Will Deacon <will.deacon@xxxxxxx> wrote:
> >
> > +#ifndef mmiowb_set_pending
> > +static inline void mmiowb_set_pending(void)
> > +{
> > + __this_cpu_write(__mmiowb_state.mmiowb_pending, 1);
> > +}
> > +#endif
> > +
> > +#ifndef mmiowb_spin_lock
> > +static inline void mmiowb_spin_lock(void)
> > +{
> > + if (__this_cpu_inc_return(__mmiowb_state.nesting_count) == 1)
> > + __this_cpu_write(__mmiowb_state.mmiowb_pending, 0);
> > +}
> > +#endif
>
> The case we want to go fast is the spin-lock and unlock case, not the
> "set pending" case.
>
> And the way you implemented this, it's exactly the wrong way around.
>
> So I'd suggest instead doing
>
> static inline void mmiowb_set_pending(void)
> {
> __this_cpu_write(__mmiowb_state.mmiowb_pending,
> __mmiowb_state.nesting_count);
> }
>
> and
>
> static inline void mmiowb_spin_lock(void)
> {
> __this_cpu_inc(__mmiowb_state.nesting_count);
> }
>
> which makes that spin-lock code much simpler and avoids the conditional there.

Makes sense; I'll hook that up for the next version.

> Then the unlock case could be something like
>
> static inline void mmiowb_spin_unlock(void)
> {
> if (unlikely(__this_cpu_read(__mmiowb_state.mmiowb_pending))) {
> __this_cpu_write(__mmiowb_state.mmiowb_pending, 0);
> mmiowb();
> }
> __this_cpu_dec(__mmiowb_state.nesting_count);
> }
>
> or something (xchg is generally much more expensive than read, and the
> common case for spinlocks is that nobody did IO inside of it).

So I *am* using __this_cpu_xchg() here, which means the architecture can
get away with plain old loads and stores (which is what RISC-V does, for
example), but I see that's not the case on e.g. x86 so I'll rework using
read() and write() because it doesn't hurt.

Will