Re: [PATCH v10 2/5] sched: CGroup tagging interface for core scheduling

From: Chris Hyser
Date: Tue Feb 23 2021 - 14:28:40 EST


On 2/23/21 4:05 AM, Peter Zijlstra wrote:
On Mon, Feb 22, 2021 at 11:00:37PM -0500, Chris Hyser wrote:
On 1/22/21 8:17 PM, Joel Fernandes (Google) wrote:
While trying to test the new prctl() code I'm working on, I ran into a bug I
chased back into this v10 code. Under a fair amount of stress, when the
function __sched_core_update_cookie() is ultimately called from
sched_core_fork(), the system deadlocks or otherwise non-visibly crashes.
I've not had much success figuring out why/what. I'm running with LOCKDEP on
and seeing no complaints. Duplicating it only requires setting a cookie on a
task and forking a bunch of threads ... all of which then want to update
their cookie.

Can you share the code and reproducer?

Attached is a tarball with c code (source) and scripts. Just run ./setup_bug which will compile the source and start a bash with a cs cookie. Then run ./show_bug which dumps the cookie and then fires off some processes and threads. Note the cs_clone command is not doing any core sched prctls for this test (not needed and currently coded for a diff prctl interface). It just creates processes and threads. I see this hang almost instantly.

Josh, I did verify that this occurs on Joel's coresched tree both with and w/o the kprot patch and that should exactly correspond to these patches.

-chrish

Attachment: bug.tar.xz
Description: application/xz