Re: SCHED_DEADLINE with CPU affinity

From: Philipp Stanner
Date: Tue Dec 24 2019 - 05:01:51 EST


On Wed, 20.11.2019, 09:50 +0100 Juri Lelli wrote:
> Hi Philipp,

Hey Juri,

thanks so far; we indeed could make it work with exclusive CPU-sets.

On 19/11/19 23:20, Philipp Stanner wrote:
>
> > from implementing our intended architecture.
> >
> > Now, the questions we're having are:
> >
> > 1. Why does the kernel do this, what is the problem with
> > scheduling with
> > SCHED_DEADLINE on a certain core? In contrast, how is it
> > handled when
> > you have single core systems etc.? Why this artificial
> > limitation?
>
> Please have also a look (you only mentioned manpage so, in case you
> missed it) at
>
> https://elixir.bootlin.com/linux/latest/source/Documentation/scheduler/sched-deadline.rst#L667
>
> and the document in general should hopefully give you the answer
> about
> why we need admission control and current limitations regarding
> affinities.
>
> > 2. How can we possibly implement this? We don't want to use
> > SCHED_FIFO,
> > because out-of-control tasks would freeze the entire
> > container.
>
> I experimented myself a bit with this kind of setup in the past and I
> think I made it work by pre-configuring exclusive cpusets (similarly
> as
> what detailed in the doc above) and then starting containers inside
> such
> exclusive sets with podman run --cgroup-parent option.
>
> I don't have proper instructions yet for how to do this (plan to put
> them together soon-ish), but please see if you can make it work with
> this hint.

I fear I have not understood quite well yet why this
"workaround" leads to (presumably) the same results as set_affinity
would. From what I have read, I understand it as follows: For
sched_dead, admission control tries to guarantee that the requested
policy can be executed. To do so, it analyzes the current workload
situation, taking especially the number of cores into account.

Now, with a pre-configured set, the kernel knows which tasks will run
on which core, therefore it's able to judge wether a process can be
deadline scheduled or not. But when using the default way, you could
start your processes as SCHED_OTHER, set SCHED_DEADLINE as policy and
later many of them could suddenly call set_affinity, desiring to run on
the same core, therefore provoking collisions.

Is my understanding of the situation correct?

Merry Christmas,
P.