Re: sched: Move SCHED_DEBUG sysctl to debugfs

From: Peter Zijlstra
Date: Wed Apr 28 2021 - 04:48:03 EST


On Tue, Apr 27, 2021 at 04:59:25PM +0200, Christian Borntraeger wrote:
> Peter,
>
> I just realized that we moved away sysctl tunabled to debugfs in next.
> We have seen several cases where it was benefitial to set
> sched_migration_cost_ns to a lower value. For example with KVM I can
> easily get 50% more transactions with 50000 instead of 500000.
> Until now it was possible to use tuned or /etc/sysctl.conf to set
> these things permanently.
>
> Given that some people do not want to have debugfs mounted all the time
> I would consider this a regression. The sysctl tunable was always
> available.
>
> I am ok with the "informational" things being in debugfs, but not
> the tunables. So how do we proceed here?

It's all SCHED_DEBUG; IOW you're relying on DEBUG infrastructure for
production performance, and that's your fail.

I very explicitly do not care to support people that poke random values
into those 'tunables'. If people wants to do that, they get to keep any
and all pieces.

The right thing to do here is to analyze the situation and determine why
migration_cost needs changing; is that an architectural thing, does s390
benefit from less sticky tasks due to its cache setup (the book caches
could be absorbing some of the penalties here for example). Or is it
something that's workload related, does KVM intrinsically not care about
migrating so much, or is it something else.

Basically, you get to figure out what the actual performance issue is,
and then we can look at what to do about it so that everyone benefits,
and not grow some random tweaks on the interweb that might or might not
actually work for someone else.