Re: [PATCH 0/7] sched/deadline: fix cpusets bandwidth accounting

From: Mathieu Poirier
Date: Thu Oct 12 2017 - 12:57:17 EST


On 11 October 2017 at 10:02, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Wed, Aug 16, 2017 at 03:20:36PM -0600, Mathieu Poirier wrote:
>
>> In this set the problem is addressed by relying on existing list of tasks
>> (sleeping or not) already maintained by CPUsets.
>
> Right, that's a much saner approach :-)

Luca and Juri had the same opinion so let's continue with that solution.

>
>> OPEN ISSUE:
>>
>> Regardless of how we proceed (using existing CPUset list or new ones) we
>> need to deal with DL tasks that span more than one root domain, something
>> that will typically happen after a CPUset operation. For example, if we
>> split the number of available CPUs on a system in two CPUsets and then turn
>> off the 'sched_load_balance' flag on the parent CPUset, DL tasks in the
>> parent CPUset will end up spanning two root domains.
>>
>> One way to deal with this is to prevent CPUset operations from happening
>> when such condition is detected, as enacted in this set. Although simple
>> this approach feels brittle and akin to a "whack-a-mole" game. A better
>> and more reliable approach would be to teach the DL scheduler to deal with
>> tasks that span multiple root domains, a serious and substantial
>> undertaking.
>>
>> I am sending this as a starting point for discussion. I would be grateful
>> if you could take the time to comment on the approach and most importantly
>> provide input on how to deal with the open issue underlined above.
>
> Right, so teaching DEADLINE about arbitrary affinities is 'interesting'.
>
> Although the rules proposed by Tomasso; if found sufficient; would
> greatly simplify things. Also the online semi-partition approach to SMP
> could help with that.

The "rules" proposed by Tomasso, are you referring to patches or the
deadline/cgroup extension work that he presented at OSPM? I'd also be
interested to know more about this "online semi-partition approach to
SMP" you mentioned. Maybe that's a conversation we could have at the
upcoming RT summit in Prague.

>
> But yes, that's fairly massive surgery. For now I think we'll have to
> live and accept the limitations. So failing the various cpuset
> operations when they violate rules seems fine. Relaxing rules is always
> easier than tightening them (later).

Agreed.

>
> One 'series' you might be interested in when respinning these is:
>
> https://lkml.kernel.org/r/20171011094833.pdp4torvotvjdmkt@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>
> By doing synchronous domain rebuild we loose a bunch of funnies.

Getting rid of the asynchronous nature of the hotplug path would be a
delight - I'll start keeping track of that effort as well.

Thanks for the review,
Mathieu