Re: [PATCH v10 2/5] sched: CGroup tagging interface for core scheduling

From: Chris Hyser
Date: Fri Feb 05 2021 - 23:18:07 EST

Next message: Uwe Kleine-König: "[PATCH v2 5/5] dax-device: Make remove callback return void"
Previous message: Daniel Latypov: "[PATCH v2 2/2] kunit: ubsan integration"
In reply to: Peter Zijlstra: "Re: [PATCH v10 2/5] sched: CGroup tagging interface for core scheduling"
Next in thread: Peter Zijlstra: "Re: [PATCH v10 2/5] sched: CGroup tagging interface for core scheduling"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2/5/21 5:43 AM, Peter Zijlstra wrote:

On Thu, Feb 04, 2021 at 03:52:55PM -0500, Chris Hyser wrote:

A second complication was a decision that new processes (not threads) do not
inherit their parents cookie. Thus forking is also not a means to share a
cookie. Basically with a "from-only" interface, the new task would need to
be modified to call prctl() itself. From-only also does not allow for
setting a cookie on an unmodified already running task. This can be fixed by
providing both a "to" and "from" sharing interface that allows helper
programs to construct arbitrary configurations from unmodified programs.

Do we really want to inhibit on fork() or would exec() be a better
place? What about those applications that use fork() based workers?

I like exec-time as a fan of fork-based workers. I suppose the counter argument would be security, but the new process is in a state to be trusted to lower it's privileges, change permissions, core sched cookie etc before it makes itself differently vulnerable and because it is the same code, it "knows" if it did.

Also, how do I set a unique cookie on myself with this interface?

The v10 patch still uses the overloaded v9 mechanism (which as mentioned
above is if two tasks w/o cookies share a cookie they get a new shared
unique cookie). Yes, that is clearly an inconsistency and kludgy. The
mechanism is documented in the docs, but clearly not obvious from the

I've not seen a document so far (also, I'm not one to actually read
documentation, much preferring comments and Changelogs).

Understood. I think we should split this patch into separate prctl and cgroup patches. The rationale decided here would then go into the prctl patch commit message. We can also use this split to address any dependencies we've created on cgroups that you mentioned in a different email.

So based on the above, how about we add a "create" to pair with "clear" and
call it "create" vs "set" since we are creating a unique opaque cookie
versus setting a particular value. And as mentioned, because one can't
specify a cookie directly but only thru sharing relationships, we need both
"to" and "from" to make it completely usable.

So we end up with something like this:
PR_SCHED_CORE_CREATE -- give yourself a unique cookie
PR_SCHED_CORE_CLEAR -- clear your core sched cookie
PR_SCHED_CORE_SHARE_FROM <src_task> -- get their cookie for you
PR_SCHED_CORE_SHARE_TO <dest_task> -- push your cookie to them

I'm still wondering why we need _FROM/_TO. What exactly will we miss
with just _SHARE <pid>?

current arg_task
<none> <none> -EDAFT
<none> <cookie> current gets cookie
<cookie> <none> arg_task gets cookie
<cookie> <cookie> -EDAFTER

(I have a suspicion, but I want to see it spelled out).

The min requirements of the interface I see are:

1. create a my own cookie
2. clear my own cookie
3. create a cookie for a running unmodified task
4. clear a cookie for a running unmodified task
5. copy a cookie from one running unmodified task to another unmodified task

So from your example above:
> <none> <cookie> current gets cookie

could also mean clear the cookie of arg_task and satisfy requirement 4 above.

"Share" is a fuzzy term. I should have used COPY as that is more the semantics I was thinking ... specified directional transfer. So we could have a single command with two arguments where argument order determines direction. In the v10 interface proposal, as one argument, current, was always implied, direction had to be specified in the command.

So a single copy command could be something like:

PR_SCHED_CORE_COPY <src_task> <dst_task>

to replace the two. The very first util you write to do any thing useful w/ all of this is a "copy_cookie <spid> <dpid>". :-)

Also, do we wants this interface to be able to work on processes? Things
like fcntl(F_SETOWN_EX) allow you to specify a PID type.

Yes and creating/clearing a cookie on a PGID and SID seem like useful shortcuts to look into.

An additional question is should the inheritability of a process' cookie be
configurable? The current code gives the child process their own unique
cookie if the parent had a cookie. That is useful in some cases, but many
other configurations could be made much easier with inheritance.

What was the argument for not following the traditional fork() semantics
and inheriting everything?

The code just showed up with no explanation :-), but I think I know what was intended and it touches on the same security policy type problem you mentioned in a comment on the CLEAR code. In a secure context, you can't just allow a random user to clear their cookie, i.e. make themselves trusted. At the same time, in a non-secure context, and several use cases have been put forward, I can't think of anything more annoying then being able to set a cookie on my task and then not having permission to clear it.

The fork scenario has a similar problem. A child inheriting the cookie means just that, but not inheriting is likely different depending on whether its secure vs non-secure (and obviously we can't use those terms. Who wants to advocate for non-security :-). For non-secure, don't inherit means the child gets no cookie; secure means the child gets their own unique cookie and not the parent's. In the absence of any way to set a policy, the current code chose the secure default which makes sense.

So I guess that raises the ugly question, do we need some kind of globally scoped, "secure/not-secure" core sched policy flag?

-chrish

Next message: Uwe Kleine-König: "[PATCH v2 5/5] dax-device: Make remove callback return void"
Previous message: Daniel Latypov: "[PATCH v2 2/2] kunit: ubsan integration"
In reply to: Peter Zijlstra: "Re: [PATCH v10 2/5] sched: CGroup tagging interface for core scheduling"
Next in thread: Peter Zijlstra: "Re: [PATCH v10 2/5] sched: CGroup tagging interface for core scheduling"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]