Re: [BUG, bisect] hrtimer: severe lag after suspend & resume

From: Thomas Gleixner
Date: Mon Jun 08 2015 - 15:32:12 EST


On Mon, 8 Jun 2015, John Stultz wrote:
> On Mon, Jun 8, 2015 at 12:44 AM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> > Well, the problem is that we need to fetch that data on several
> > occasions:
> >
> > - hrtimer_start (if it is the first expiring timer of a clock)
> > - hrtimer_reprogram (after canceling the first timer)
> > - hrtimer_interrupt
>
> hrtimer_interrupt is really the one I'm most concerned with, because
> that's what does timer expiration. For that usage, having duplicated
> time state has been repeatedly problematic.

And hurting performance just to solve problematic issues is definitely
not a solution either.

> Right, so without my proposed patch, this is an issue, but my proposed
> patch requires that the hrtimer_interrupt not use cached offsets in
> order to ensure the read-state is adjusted properly for the
> leapsecond. (Or it requries the hrtimer_interrupt path to also cache
> the leapsecond state so it can do the same adjustment to the cached
> data, but this seems terribly duplicative).

No, we can be smarter than that, really.

> But seriously, I'm earnestly looking for specifics (like which stress
> tests are you caring about) here, so I can try to also watch that
> patches I write or take don't undo your performance optimizations, so
> in the future you'll have to yell at fewer people.

I'm often using cyclictest or a variant of it which can arm more
timers. I play with various configurations for this, so I can't give a
receipe right away. Will try to remember when I play with it next
time.

> Even outside of my leapsecond correctness concern, I think caching
> time state like you're doing here is a maintenance issue. It has bit
> us a number of times already, and things are complicated enough that
> even fixing the issues that stem from it are non-trivial.

The caching is solely a problem for this silly leap second corner
case. So sacrifying performance for the general case just to solve
this particular issue is really the wrong answer. We know that it is
problematic, so we can have tests which verify that it did not break.

Sure it requires a bit more brain cycles to get it right, but that's
well worth spent cycles.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/