Re: [patch 54/55] timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC[_RAW]

From: Thomas Gleixner
Date: Mon Jul 14 2014 - 05:04:53 EST


On Mon, 14 Jul 2014, Peter Zijlstra wrote:
> On Fri, Jul 11, 2014 at 01:45:19PM -0000, Thomas Gleixner wrote:
> > Tracers want a correlated time between the kernel instrumentation and
> > user space. We really do not want to export sched_clock() to user
> > space, so we need to provide something sensible for this.
> >
> > Using separate data structures with an non blocking sequence count
> > based update mechanism allows us to do that. The data structure
> > required for the readout has a sequence counter and two copies of the
> > timekeeping data.
> >
> > On the update side:
> >
> > tkf->seq++;
> > smp_wmb();
> > update(tkf->base[0], tk;
> > tkf->seq++;
> > smp_wmb();
> > update(tkf->base[1], tk;
> >
> > On the reader side:
> >
> > do {
> > seq = tkf->seq;
> > smp_rmb();
> > idx = seq & 0x01;
> > now = now(tkf->base[idx]);
> > smp_rmb();
> > } while (seq != tkf->seq)
> >
> > So if NMI hits the update of base[0] it will use base[1] which is
> > still consistent. In case of CLOCK_MONOTONIC this can result in
> > slightly wrong timestamps (a few nanoseconds) accross an update. Not a
> > big issue for the intended use case.
>
> But it breaks monotonicity, right? ;-)

It can in theory, but does it really matter for tracing?

> Also, what happens when TSC is not available as a clocksource? There's
> still a metric ton of hardware (including the latest generation HSW)
> that has fucked firmware/TSC.

Well, bad luck then. You end up using hpet or worse, but it's still
your decision whether to base your instrumentation on that or not. For
sane clock sources (i.e. almost anything except TSC) it works
perfectly fine.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/