Re: [PATCH 7/9] sched: Add migrate_disable()

From: Thomas Gleixner
Date: Wed Sep 23 2020 - 04:31:17 EST


On Mon, Sep 21 2020 at 22:42, Daniel Bristot de Oliveira wrote:
> On 9/21/20 9:16 PM, Thomas Gleixner wrote:
>> On Mon, Sep 21 2020 at 18:36, Peter Zijlstra wrote:
>> But seriously, I completely understand your concern vs. schedulability
>> theories, but those theories can neither deal well with preemption
>> disable simply because you can create other trainwrecks when enough low
>> priority tasks run long enough in preempt disabled regions in
>> parallel. The scheduler simply does not know ahead how long these
>> sections will take and how many of them will run in parallel.
>>
>> The theories make some assumptions about preempt disable and consider it
>> as temporary priority ceiling, but that's all assumptions as the bounds
>> of these operations simply unknown.
>
> Limited preemption is something that is more explored/well known than
> limited/arbitrary affinity - I even know a dude that convinced academics about
> the effects/properties of preempt disable on the PREEMPT_RT!

I'm sure I never met that guy.

> But I think that the message here is that: ok, migrate disable is better for the
> "scheduling latency" than preempt disable (preempt rt goal). But the
> indiscriminate usage of migrate disable has some undesired effects for "response
> time" of real-time threads (scheduler goal), so we should use it with caution -
> as much as we have with preempt disable. In the end, both are critical for
> real-time workloads, and we need more work and analysis on them both.
...
>> But as the kmap discussion has shown, the current situation of enforcing
>> preempt disable even on a !RT kernel is not pretty either. I looked at
>> quite some of the kmap_atomic() usage sites and the resulting
>> workarounds for non-preemptability are pretty horrible especially if
>> they do copy_from/to_user() or such in those regions. There is tons of
>> other code which really only requires migrate disable
>
> (not having an explicit declaration of the reason to disable preemption make
> these all hard to rework... and we will have the same with migrate disable.
> Anyways, I agree that disabling only migration helps -rt now [and I like
> that]... but I also fear/care for scheduler metrics on the long term... well,
> there is still a long way until retirement.)

Lets have a look at theory and practice once more:

1) Preempt disable

Theories take that into account by adding a SHC ('Sh*t Happens
Coefficient') into their formulas, but the practical effects cannot
ever be reflected in theories accurately.

In practice, preempt disable can cause unbound latencies and while we
all agree that long preempt/interrupt disabled sections are bad, it's
not really trivial to break these up without rewriting stuff from
scratch. The recent discussion about unbound latencies in the page
allocator is a prime example for that.

The ever growing usage of per CPU storage is not making anything
better and right now preempt disable is the only tool we have at the
moment in mainline to deal with that.

That forces people to come up with code constructs which are more
than suboptimal both in terms of code quality and in terms of
schedulability/latency. We've seen mutexes converted to spinlocks
just because of that, conditionals depending on execution context
which turns out to be broken and inconsistent, massive error handling
trainwrecks, etc.

2) Migrate disable

Theories do not know anything about it, but in the very end it's
going to be yet another variant of SHC to be defined.

In practice migrate disable could be taken into account on placement
decisions, but yes we don't have anything like that at the moment.

The theoretical worst case which forces all and everything on a
single CPU is an understandable concern, but the practical relevance
is questionable. I surely stared at a lot of traces on heavily loaded
RT systems, but too many prempted migrate disabled tasks was truly
never a practical problem. I'm sure you can create a workload
scenario which triggers that, but then you always can create
workloads which are running into the corner cases of any given
system.

The charm of migrate disable even on !RT is that it allows for
simpler code and breaking up preempt disabled sections, which is IMO
a clear win given that per CPU ness is not going away -unless the
chip industry comes to senses and goes back to the good old UP
systems which have natural per CPU ness :)

That said, let me paraphrase that dude you mentioned above:

Theories are great and useful, but pragmatism has proven to produce
working solutions even if they cannot work according to theory.

Thanks,

tglx