Re: [PATCH v2] tools/memory-model: Add extra ordering for locks and remove it for ordinary release/acquire

From: Linus Torvalds
Date: Mon Jul 16 2018 - 15:30:24 EST


On Mon, Jul 16, 2018 at 7:40 AM Michael Ellerman <mpe@xxxxxxxxxxxxxx> wrote:
>
> If the numbers can be trusted it is actually slower to put the sync in
> lock, at least on one of the machines:
>
> Time
> lwsync_sync 84,932,987,977
> sync_lwsync 93,185,930,333

Very funky.

> I guess arguably it's not a very macro benchmark, but we have a
> context_switch benchmark in the tree[1] which we often use to tune
> things, and it degrades badly. It just spins up two threads and has them
> ping-pong using yield.

I hacked that up to run on x86, and it only is about 5% locking
overhead in my profiles. It's about 18% __switch_to, and a lot of
system call entry/exit, but not a lot of locking.

I'm actually surprised it is even that much locking, since it seems to
be single-cpu, so there should be no contention and the lock (which
seems to be

rq = this_rq();
rq_lock(rq, &rf);

in do_sched_yield()) should stay local to the cpu.

And for you the locking is apparently even _more_ noticeable.

But yes, a 10% regression on that context switch thing is huge. You
shouldn't do ping-pong stuff, but people kind of do.

Linus