Re: question about cpusets vs sched_setaffinity()

From: Chris Friesen
Date: Fri Dec 11 2015 - 18:35:08 EST


On 12/11/2015 04:15 PM, Jason Baron wrote:
On 12/10/2015 04:30 PM, Chris Friesen wrote:

If I put a task into a cpuset and then call sched_setaffinity() on it,
it will be affined to the intersection of the two sets of cpus. (Those
specified on the set, and those specified in the syscall.)

However, if I then change the cpus in the cpuset the process affinity
will simply be overwritten by the new cpuset affinity. It does not seem
to take into account any restrictions from the original
sched_setaffinity() call.

Wouldn't it make more sense to affine the process to the intersection
between the new set of cpus from the cpuset, and the current process
affinity? That way if I explicitly masked out certain CPUs in the
original sched_setaffinity() call then they would remain masked out
regardless of changes to the set of cpus assigned to the cpuset.

<snip>

To add the behavior you are describing, I think requires another
cpumask_t field in the task_struct. Where we could store the last
requested mask value for sched_setaffinity() and use that when updating
the cpus for a cpuset via an intersection as you described. I think
adding a task to a cpuset still should wipe out any sched_setaffinity()
settings - but that would depend on the desired semantics here. It would
also require a knob so as not to break existing behavior by default.

Agreed, the additional field in the task_struct makes sense. Personally I don't think that adding a task to a cpuset should wipe out any previously-set affinity, I think it should take the intersection for that case as well.

In this environment it might make sense to have separate queries to return the requested and actual affinity.

You could also create a child cgroup for the process that you don't want
to change and set the cpus on that cgroup instead of using
sched_setaffinity(). Then you change the cpus for the parent cgroup and
that shouldn't affect the child as long as the child cgroup is a subset.
But its not entirely clear to me if that addresses your use-case?

I ended up doing something like this where I had a top-level cpuset and a number of child cpusets, each with an exclusive subset of the CPUs assigned to it. But it meant that I needed more complicated code to figure out which tasks needed to go into which child cpusets, and more complicated code to handle removing a CPU from the top-level cpuset (since you have to remove it from any children first).

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/