Re: [RFC PATCH] locking/percpu-rwsem: do not do lock handoff in percpu_up_write

From: Benjamin Segall
Date: Mon Jan 29 2024 - 15:36:34 EST


Hillf Danton <hdanton@xxxxxxxx> writes:

> On Fri, 26 Jan 2024 12:40:43 -0800 Benjamin Segall <bsegall@xxxxxxxxxx>
>>
>> I'm fine with "no, fairness is more important than these performance
>> numbers or mitigating already-sorta-broken situations", but it's not
>
> Fine too because your patch is never able to escape standing ovation.
>
> And feel free to specify the broken situations you saw.

The basic locking issue was due to userspace rapidly spawning threads
(or processes) more rapidly than the cpus they are running on can
support, and this causing issues for unrelated threads doing cgroup
operations on other cpus.

The contention can be due to a combination of just straight up spawning
way too many, userspace misconfiguration of cpus allowed, or load
balancer weaknesses. (If you pick minimum cpu.shares values and have
large machines, SCHED_LOAD_RESOLUTION isn't really enough for load
balance to do a good job, and what you're telling the load balancer you
want isn't really a good idea in the first place).

>
>> clear to me you've even understood the patch, because you keep only
>> talking about completely different forms of starvation, and suggesting
>
> Given woken writer in your reply and sem->ww is write waiters, there is
> only one starvation in this thread.

The problem is *not* new readers/writers coming in and taking the lock
before the waiting ones, which is what your patch would solve. The
problem is that some of the woken readers do not get scheduled for a
long time, and nothing can take the lock until they do.