Re: [PATCH 0/7] sched/deadline: fix cpusets bandwidth accounting

From: Mathieu Poirier
Date: Fri Aug 25 2017 - 16:29:34 EST


On 25 August 2017 at 08:37, Luca Abeni <luca.abeni@xxxxxxxxxxxxxxx> wrote:
> Hi Mathieu,
>
> On Wed, 23 Aug 2017 13:47:13 -0600
> Mathieu Poirier <mathieu.poirier@xxxxxxxxxx> wrote:
>
>> On 22 August 2017 at 06:21, Luca Abeni <luca.abeni@xxxxxxxxxxxxxxx> wrote:
>> > Hi Mathieu,
>>
>> Good day to you,
>>
>> >
>> > On Wed, 16 Aug 2017 15:20:36 -0600
>> > Mathieu Poirier <mathieu.poirier@xxxxxxxxxx> wrote:
>> >
>> >> This is a renewed attempt at fixing a problem reported by Steve Rostedt [1]
>> >> where DL bandwidth accounting is not recomputed after CPUset and CPUhotplug
>> >> operations. When CPUhotplug and some CUPset manipulation take place root
>> >> domains are destroyed and new ones created, loosing at the same time DL
>> >> accounting pertaining to utilisation.
>> >
>> > Thanks for looking at this longstanding issue! I am just back from
>> > vacations; in the next days I'll try your patches.
>> > Do you have some kind of scripts for reproducing the issue
>> > automatically? (I see that in the original email Steven described how
>> > to reproduce it manually; I just wonder if anyone already scripted the
>> > test).
>>
>> I didn't bother scripting it since it is so easy to do. I'm eager to
>> see how things work out on your end.
>
> I ran some tests with your patchset, and I confirm that it fixes the
> issue originally pointed out by Steven.
>

Good, at least it's a start.

> But I still need to run some more tests (I'll continue on Monday).
>
> I think I found an issue by:
> 1) creating two disjoint cpusets (CPUs 0 and 1 in the first cpuset,
> CPUs 2 and 3 in the second one) and setting sched_load_balance to 0
> 2) starting a task in one of the two cpusets, and making it
> SCHED_DEADLINE <--- up to here, everything looks fine
> 3) setting sched_load_balance to 1 <--- At this point, I think there is
> a bug: the system has only one root domain, and the task utilization
> is summed to it... But the task affinity mask is still the one of
> the "old root domain" that was associated with the cpuset where the
> task is executing.

I can reproduce the problem on my side as well.

This is how CPUset works and the expected behaviour. For normal tasks
it isn't a problem but I agree with you that for DL tasks, we need to
address this.

>
> I still need to run some experiments about this.

Thanks for the time,
Mathieu

>
>
>
> Thanks,
> Luca