Re: [PATCH 2/2] sched/eevdf: Use sched_attr::sched_runtime to set request/slice suggestion

From: Qais Yousef
Date: Tue Sep 19 2023 - 20:45:50 EST


On 09/20/23 00:37, Peter Zijlstra wrote:
> On Tue, Sep 19, 2023 at 11:07:08PM +0100, Qais Yousef wrote:
> > On 09/15/23 14:43, peterz@xxxxxxxxxxxxx wrote:
> > > Allow applications to directly set a suggested request/slice length using
> > > sched_attr::sched_runtime.
> >
> > I'm probably as eternally confused as ever, but is this going to be the latency
> > hint too? I find it hard to correlate runtime to latency if it is.
>
> Yes. Think of it as if a task has to save up for it's slice. Shorter
> slice means a shorter time to save up for the time, means it can run
> sooner. Longer slice, you get to save up longer.

Okay, so bias toward latency (short runtime) or throughput (long runtime).

I revisited the paper and can appreciate the importance of the term 'request'
in here.

Is the 95%+ confidence part really mandatory? I can easily see runtime swings
between 2-4ms over a trace for example. Should this task request 4ms as runtime
then? If we request 2ms but we ended up needing 4ms, IIUC we'll be preempted
after 2ms as that's what we requested, right?

What is the penalty for lying if we request 4ms but end up needing 2ms?

> Some people really want longer slices to reduce cache trashing or
> held-lock-preemption like things. Oracle, Facebook, or virt thingies.
>
> Other people just want very short activations but wants them quickly.

Is 3-5ms in the very short region? I think that's the average I see. There are
longer, and shorter, but nothing 'crazy' long.

If we have a bunch of very short tasks stuck on the same rq; IIUC the ones that
actually requested the shorter slice should win as the other will still have
sysctl_sched_base_slice as their request, hence the deadline will seem further
away in spite of not consuming their full slice. And somehow lag will sort
itself to ensure fairness if there were too many wake ups of short-request
tasks (latency wakeup storm).

With this interface it'd be sort of compulsory for users to keep their latency
sensitive tasks short; which maybe is a good thing. The question is how short
do they have to be. Is there a number that can be exposed or deduced/calculated
to help identify/advise users to stay within?

Silly question, do you think this interface is transferable if we move away
from EEVDF in the future for whatever reason? I feel I have to reason about how
EEVDF works to use it, which probably was my first stumbling point as I was
thinking in a more detached/abstract manner.

Sorry, too many questions..


Thanks!

--
Qais Yousef