Re: [PATCH 00/15] sched: EEVDF and latency-nice and/or slice-attr

From: Youssef Esmat
Date: Mon Oct 02 2023 - 11:55:25 EST


On Fri, Sep 29, 2023 at 11:54 AM Youssef Esmat
<youssefesmat@xxxxxxxxxxxx> wrote:
>
> >
> > EEVDF fundamentally supports per task request/slice sizes, which is the
> > primary motivator for finally finishing these patches.
> >
> > So the plan is to extend sched_setattr() to allow tasks setting their
> > own ideal slice length. But we're not quite there yet.
> >
> > Having just returned from PTO the mailbox is an utter trainwreck, but
> > I'll try and refresh those few patches this week for consideration.
> >
> > In the meantime I think you found the right knob to twiddle.
>
> Hello Peter,
>
> I am trying to understand a little better the need for the eligibility
> checks (entity_eligible). I understand the general concept, but I am
> trying to find a scenario where it is necessary. And maybe propose to
> have it toggled by a feature flag.
>
> Some of my testing:
>
> All my testing was done on a two core Celeron N400 cpu system 1.1Ghz.
> It was done on the 6.5-rc3 kernel with EEVDF changes ported.
>
> I have two CPU bound tasks one with a nice of -4 and the other with a
> nice of 0. They are both affinitized to CPU 0. (while 1 { i++ })
>
> With entity_eligible *enabled* and with entity_eligible *disabled*
> (always returns 1):
> Top showed consistent results, one at ~70% and the other at ~30%
>
> So it seems the deadline adjustment will naturally achieve fairness.
>
> I also added a few trace_printks to see if there is a case where
> entity_eligible would have returned 0 before the deadline forced us to
> reschedule. There were a few such cases. The following snippet of
> prints shows that an entity became ineligible 2 slices before its
> deadline expired. It seems like this will add more context switching
> but still achieve a similar result at the end.
>
> bprint: pick_eevdf: eligibility check: tid=4568,
> eligible=0, deadline=26577257249, vruntime=26575761118
> bprint: pick_eevdf: found best deadline: tid=4573,
> deadline=26575451399, vruntime=26574838855
> sched_switch: prev_comm=loop prev_pid=4568 prev_prio=120
> prev_state=R ==> next_comm=loop next_pid=4573 next_prio=116
> bputs: task_tick_fair: tick
> bputs: task_tick_fair: tick
> bprint: pick_eevdf: eligibility check: tid=4573,
> eligible=1, deadline=26576270304, vruntime=26575657159
> bprint: pick_eevdf: found best deadline: tid=4573,
> deadline=26576270304, vruntime=26575657159
> bputs: task_tick_fair: tick
> bputs: task_tick_fair: tick
> bprint: pick_eevdf: eligibility check: tid=4573,
> eligible=0, deadline=26577089170, vruntime=26576476006
> bprint: pick_eevdf: found best deadline: tid=4573,
> deadline=26577089170, vruntime=26576476006
> bputs: task_tick_fair: tick
> bputs: task_tick_fair: tick
> bprint: pick_eevdf: eligibility check: tid=4573,
> eligible=0, deadline=26577908042, vruntime=26577294838
> bprint: pick_eevdf: found best deadline: tid=4568,
> deadline=26577257249, vruntime=26575761118
> sched_switch: prev_comm=loop prev_pid=4573 prev_prio=116
> prev_state=R ==> next_comm=loop next_pid=4568 next_prio=120
>
> In a more practical example, I tried this with one of our benchmarks
> that involves running Meet and Docs side by side and measuring the
> input latency in the Docs document. The following is the average
> latency for 5 runs:
>
> (These numbers are after removing our cgroup hierarchy - that might be
> a discussion for a later time).
>
> CFS: 168ms
> EEVDF with eligibility: 206ms (regression from CFS)
> EEVDF *without* eligibility: 143ms (improvement to CFS)
> EEVDF *without* eligibility and with a 6ms base_slice_ns (was 1.5ms):
> 104ms (great improvement)
>
> Removing the eligibility check for this workload seemed to result in a
> great improvement. I haven't dug deeper but I suspect it's related to
> reduced context switches on our 2 core system.
> As an extra test I also increased the base_slice_ns and it further
> improved the input latency significantly.
>
> I would love to hear your thoughts. Thanks!

For completeness I ran two more tests:

1. EEVDF with eligibility and 6ms base_slice_ns.
2. EEVDF with eligibility with Benjamin Segall's patch
(https://lore.kernel.org/all/xm261qego72d.fsf_-_@xxxxxxxxxx/).

I copied over all the previous results for easier comparison.

CFS: 168ms
EEVDF, eligib, 1.5ms slice: 206ms
EEVDF, eligib, 6ms slice: 167ms
EEVDF_Fix, eligib, 1.5ms slice: 190ms
EEVDF, *no*eligib, 1.5ms slice: 143ms
EEVDF, *no*eligib, 6ms slice: 104ms

It does seem like Benjamin's fix did have an improvement.