Re: [PATCH] hrtimer: Unbreak hrtimer_force_reprogram()

From: Thomas Gleixner
Date: Fri Aug 13 2021 - 03:59:00 EST


On Thu, Aug 12 2021 at 22:32, Thomas Gleixner wrote:
> Since the recent consoliation of reprogramming functions,
> hrtimer_force_reprogram() is affected by a check whether the new expiry
> time is past the current expiry time.
>
> This breaks the NOHZ logic as that relies on the fact that the tick hrtimer
> is moved into the future. That means cpu_base->expires_next becomes stale
> and subsequent reprogramming attempts fail as well until the situation is
> cleaned up by an hrtimer interrupts.
>
> For some yet unknown reason this leads to a complete stall, so for now
> partially revert the offending commit to a known working state. The root
> cause for the stall is still investigated and will be fixed in a subsequent
> commit.

So with brain more awake I actually managed to decode the problem. It's
definitely the

expires > cpu_base->expires_next

check. It not only prevents the NOHZ idle case from moving the next
timer interrupt into the future, it also causes the stall when switching
into high resolution / NOHZ mode. At that point the initial base value
can be smaller than the next event which prevents reprogramming and as
the base value stays stale it prevents any further reprogramming unless
there is a full update of the base which makes the problem go away.

TBH, that optimization logic to prevent reprogramming the timer hardware
for nothing is a bit fragile and non-obvious. I'll have a look to make
this more robust and less obscure.

Thanks,

tglx