Re: [PATCH v2] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup

From: Peter Zijlstra
Date: Thu Mar 21 2019 - 14:01:53 EST


On Tue, Mar 19, 2019 at 09:00:05AM -0400, Phil Auld wrote:
> sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
>
> With extremely short cfs_period_us setting on a parent task group with a large
> number of children the for loop in sched_cfs_period_timer can run until the
> watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
> will ever return 0. The large number of children can make
> do_sched_cfs_period_timer() take longer than the period.

>
> To prevent this we add protection to the loop that detects when the loop has run
> too many times and scales the period and quota up, proportionally, so that the timer
> can complete before then next period expires. This preserves the relative runtime
> quota while preventing the hard lockup.
>
> A warning is issued reporting this state and the new values.
>
> v2: Math reworked/simplified by Peter Zijlstra.
>
> Signed-off-by: Phil Auld <pauld@xxxxxxxxxx>
> Cc: Ben Segall <bsegall@xxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> Cc: Anton Blanchard <anton@xxxxxxxxxx>

Thanks!