Re: [PATCH v2] tools/memory-model: Add extra ordering for locks and remove it for ordinary release/acquire

From: Michael Ellerman
Date: Wed Jul 18 2018 - 08:31:26 EST


Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:
> On Tue, Jul 17, 2018 at 7:45 AM Michael Ellerman <mpe@xxxxxxxxxxxxxx> wrote:
>>
>>
>> Interesting. I don't see anything as high as 18%, it's more spread out:
>>
>> 7.81% context_switch [kernel.kallsyms] [k] cgroup_rstat_updated
>
> Oh, see that's the difference.
>
> You're running in a non-root cgroup, I think.

I'm not much of a cgroup expert, but yeah I think so:

# cat /proc/self/cgroup
7:cpuset:/
6:cpu,cpuacct:/user.slice
5:memory:/user.slice
...


I guess I need to boot with init=/bin/sh.

> That also means that your scheduler overhead has way more spinlocks,
> and in particular, you have that
>
> raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu);
> ..
> raw_spin_lock_irqsave(cpu_lock, flags);
>
> there too.
>
> So you have at least twice the spinlocks that my case had, and yes,
> the costs are way more spread out because your case has all that
> cgroup accounting too.

Yeah OK. And our locks are known to be more expensive generally, so that
and the cgroups seems like it accounts for most of the difference vs
x86.

> That said, I don't understand the powerpc memory ordering. I thought
> the rules were "isync on lock, lwsync on unlock".
>
> That's what the AIX docs imply, at least.
>
> In particular, I find:
>
> "isync is not a memory barrier instruction, but the
> load-compare-conditional branch-isync sequence can provide this
> ordering property"
>
> so why are you doing "sync/lwsync", when it sounds like "isync/lwsync"
> (for lock/unlock) is the right thing and would already give memory
> barrier semantics?

I think everyone else in the thread answered this already, and better
than I could.

Our options are sync/lwsync or lwsync/sync. I've been testing
lwsync/sync, because then we could also remove the conditional sync we
have in unlock (when there were IO accesses inside the lock).

These days you can download the ISA [PDF], with no registration or
anything required, here's a direct link:

https://ibm.ent.box.com/index.php?rm=box_download_shared_file&shared_name=1hzcwkwf8rbju5h9iyf44wm94amnlcrv&file_id=f_253346319281

That has some docs on locking in Book3 appendix B.2 (page 915). It
doesn't really have that much detail, but it does talk a bit about using
lwsync vs isync.

cheers