Re: [RFC PATCH 3/3] sched/fair: Add a per-shard overload flag

From: Chen Yu
Date: Fri Oct 06 2023 - 22:10:52 EST


Hi David,

On 2023-10-03 at 16:05:11 -0500, David Vernet wrote:
> On Wed, Sep 27, 2023 at 02:59:29PM +0800, Chen Yu wrote:
> > Hi Prateek,
>
> Hi Chenyu,
>
> > On 2023-09-27 at 09:53:13 +0530, K Prateek Nayak wrote:
> > > Hello David,
> > >
> > > Some more test results (although this might be slightly irrelevant with
> > > next version around the corner)
> > >
> > > On 9/1/2023 12:41 AM, David Vernet wrote:
> > > > On Thu, Aug 31, 2023 at 04:15:08PM +0530, K Prateek Nayak wrote:
> > > >
> > > -> With EEVDF
> > >
> > > o tl;dr
> > >
> > > - Same as what was observed without EEVDF but shared_runq shows
> > > serious regression with multiple more variants of tbench and
> > > netperf now.
> > >
> > > o Kernels
> > >
> > > eevdf : tip:sched/core at commit b41bbb33cf75 ("Merge branch 'sched/eevdf' into sched/core")
> > > shared_runq : eevdf + correct time accounting with v3 of the series without any other changes
> > > shared_runq_idle_check : shared_runq + move the rq->avg_idle check before peeking into the shared_runq
> > > (the rd->overload check still remains below the shared_runq access)
> > >
> >
> > I did not see any obvious regression on a Sapphire Rapids server and it seems that
> > the result on your platform suggests that C/S workload could be impacted
> > by shared_runq. Meanwhile some individual workloads like HHVM in David's environment
> > (no shared resource between tasks if I understand correctly) could benefit from
>
> Correct, hhvmworkers are largely independent, though they do sometimes
> synchronize, and they also sometimes rely on I/O happening in other
> tasks.
>
> > shared_runq a lot. This makes me wonder if we can let shared_runq skip the C/S tasks.
>
> I'm also open to this possibility, but I worry that we'd be going down
> the same rabbit hole as what fair.c does already, which is use
> heuristics to determine when something should or shouldn't be migrated,
> etc. I really do feel that there's value in SHARED_RUNQ providing
> consistent and predictable work conservation behavior.
>
> On the other hand, it's clear that there are things we can do to improve
> performance for some of these client/server workloads that hammer the
> runqueue on larger CCXs / sockets. If we can avoid those regressions
> while still having reasonably high confidence that work conservation
> won't disproportionately suffer, I'm open to us making some tradeoffs
> and/or adding a bit of complexity to avoid some of this unnecessary
> contention.
>

Since I did not observe any regression(although I did not test hackbench
yet) on the latest version you sent to me, I'm OK with postponing the
client/server optimization to make the patchset simple, and Prateek
has other proposal to deal with the regression.

> I think it's probably about time for v4 to be sent out. What do you
> folks think about including:
>

It's OK for me and I can launch the test once the latest version is released.

thanks,
Chenyu