Re: Stopping the tick on a fully loaded system

From: Frederic Weisbecker
Date: Wed Jul 26 2023 - 06:47:47 EST


Le Tue, Jul 25, 2023 at 03:07:05PM +0200, Anna-Maria Behnsen a écrit :
> The worst case scenario will not happen, because remote timer expiry only
> happens when CPU is not active in the hierarchy. And with your proposal
> this is valid after tick_nohz_stop_tick().
>
> Nevertheless, I see some problems with this. But this also depends if there
> is the need to change current idle behavior or not. Right now, this are my
> concerns:
>
> - The determinism of tick_nohz_next_event() will break: The return of
> tick_nohz_next_event() will not take into account, if it is the last CPU
> going idle and then has to take care of remote timers. So the first timer
> of the CPU (regardless of global or local) has to be handed back even if
> it could be handled by the hierarchy.

Bah, of course...

>
> - When moving the tmigr_cpu_deactivate() to tick_nohz_stop_tick() and the
> return value of tmigr_cpu_deactivate() is before the ts->next_tick, the
> expiry has to be modified in tick_nohz_stop_tick().
>
> - The load is simply moved to a later place - tick_nohz_stop_tick() is
> never called without a preceding tick_nohz_next_event() call. Yes,
> tick_nohz_next_event() is called under load ~8% more than
> tick_nohz_stop_tick(), but the 'quality' of the return value of
> tick_nohz_next_event() is getting worse.
>
> - timer migration hierarchy is not a standalone timer infrastructure. It
> only makes sense to handle it in combination with the existing timer
> wheel. When the timer base is idle, the timer migration hierarchy with
> the migrators will do the job for global timers. So, I'm not sure about
> the impact of the changed locking - but I'm pretty sure changing that
> increases the probability for ugly races hidden somewhere between the
> lines.

Sure thing, and this won't be pretty.

>
> Thanks,
>
> Anna-Maria