Re: [PATCH v10 2/5] sched: CGroup tagging interface for core scheduling

From: Chris Hyser
Date: Wed Feb 24 2021 - 08:50:36 EST




On 2/24/21 12:15 AM, Josh Don wrote:
On Tue, Feb 23, 2021 at 11:26 AM Chris Hyser <chris.hyser@xxxxxxxxxx> wrote:

On 2/23/21 4:05 AM, Peter Zijlstra wrote:
On Mon, Feb 22, 2021 at 11:00:37PM -0500, Chris Hyser wrote:
On 1/22/21 8:17 PM, Joel Fernandes (Google) wrote:
While trying to test the new prctl() code I'm working on, I ran into a bug I
chased back into this v10 code. Under a fair amount of stress, when the
function __sched_core_update_cookie() is ultimately called from
sched_core_fork(), the system deadlocks or otherwise non-visibly crashes.
I've not had much success figuring out why/what. I'm running with LOCKDEP on
and seeing no complaints. Duplicating it only requires setting a cookie on a
task and forking a bunch of threads ... all of which then want to update
their cookie.

Can you share the code and reproducer?

Attached is a tarball with c code (source) and scripts. Just run ./setup_bug which will compile the source and start a
bash with a cs cookie. Then run ./show_bug which dumps the cookie and then fires off some processes and threads. Note
the cs_clone command is not doing any core sched prctls for this test (not needed and currently coded for a diff prctl
interface). It just creates processes and threads. I see this hang almost instantly.

Josh, I did verify that this occurs on Joel's coresched tree both with and w/o the kprot patch and that should exactly
correspond to these patches.

-chrish


I think I've gotten to the root of this. In the fork code, our cases
for inheriting task_cookie are inverted for CLONE_THREAD vs
!CLONE_THREAD. As a result, we are creating a new cookie per-thread,
rather than inheriting from the parent. Now this is actually ok; I'm
not observing a scalability problem with creating this many cookies.

This isn't the issue. The test code generates cases for both THREAD_CLONE and not and both paths call the cookie update code. The new code I was testing when I discovered this, fixed the problem you noted.


However, it means that overall throughput of your binary is cut in
~half, since none of the threads can share a core. Note that I never
saw an indefinite deadlock, just ~2x runtime for your binary vs th > control. I've verified that both a) manually hardcoding all threads to
be able to share regardless of cookie, and b) using a machine with 6
cores instead of 2, both allow your binary to complete in the same
amount of time as without the new API.

This was on a 24 core box. When I run the test, I definitely hangs. I'll answer your other email as wwll.

-chrish