Re: Severe performance regression w/ 4.4+ on Android due to cgroup locking changes

From: Peter Zijlstra
Date: Thu Jul 14 2016 - 08:11:10 EST


On Thu, Jul 14, 2016 at 07:20:46AM -0400, Tejun Heo wrote:
> On Thu, Jul 14, 2016 at 08:49:56AM +0200, Peter Zijlstra wrote:

> > So the immediate problem with lg style locks is that the 'local' lock
> > will not stay local since these are preemptible locks we can get
> > migrations etc..
> >
> > All fixable, but still.
>
> In this case, the locks are read-locked only across operations which
> change process hierarchy. They'll occasionally get migrated while
> holding the lock for sure but not often enough to matter.

Means having to change the interface to pass along what 'local' is, like
srcu_read_lock().

> > So the main objection I have is that this isn't a fundamental fix, this
> > only cures things because Android only runs on small machines.
> >
> > If someone with a big computer tries to do the same things we're up some
> > creek without no paddle. There's just no way we can make a global writer
> > 'fast'.
>
> How so? As the number of cores increases, it'll get proportionally
> more expensive as the same operation is performed on more CPUs;
> however, the latency is dependent on the slowest one and it'll get
> higher more often with more number of CPUs but not drastically.

A global lock on 4 or 8 socket machines with all 200+ cpus trying to
use it really stinks.

Remember, they switch cgroups at really rather high rates here, because
of that binder stuff. I don't see how you can defend a global lock here
:/ Global locks only work when writers are extremely rare, and clearly
that premise is false.

Also note that since these are preemptible locks, you can get unbounded
priority inversions.