Re: [RFC PATCH 16/22 -v2] add get_monotonic_cycles

From: Mathieu Desnoyers
Date: Wed Jan 16 2008 - 09:56:19 EST


* Steven Rostedt (rostedt@xxxxxxxxxxx) wrote:
>
> [ CC'd Daniel Walker, since he had problems with this code ]
>
> On Tue, 15 Jan 2008, Mathieu Desnoyers wrote:
> >
> > I agree with you that I don't see how the compiler could reorder this.
> > So we forget about compiler barriers. Also, the clock source used is a
> > synchronized clock source (get_cycles_sync on x86_64), so it should make
> > sure the TSC is read at the right moment.
> >
> > However, what happens if the clock source is, say, the jiffies ?
> >
> > Is this case, we have :
> >
> > static cycle_t jiffies_read(void)
> > {
> > return (cycle_t) jiffies;
> > }
> >
> > Which is nothing more than a memory read of
> >
> > extern unsigned long volatile __jiffy_data jiffies;
>
> Yep, and that's not my concern.
>

Hrm, I will reply to the rest of this email in a separate mail, but
there is another concern, simpler than memory ordering, that just hit
me :

If we have CPU A calling clocksource_accumulate while CPU B is calling
get_monotonic_cycles, but events happens in the following order (because
of preemption or interrupts). Here, to make things worse, we would be on
x86 where cycle_t is not an atomic write (64 bits) :


CPU A CPU B

clocksource read
update cycle_mono (1st 32 bits)
read cycle_mono
read cycle_last
clocksource read
read cycle_mono
read cycle_last
update cycle_mono (2nd 32 bits)
update cycle_last
update cycle_acc

Therefore, we have :
- an inconsistant cycle_monotonic value
- inconsistant cycle_monotonic and cycle_last values.

Or is there something I have missed ?

If you really want an seqlock free algorithm (I _do_ want this for
tracing!) :) maybe going in the RCU direction could help (I refer to my
RCU-based 32-to-64 bits lockless timestamp counter extension, which
could be turned into the clocksource updater).

Mathieu

> >
> > I think it is wrong to assume that reads from clock->cycle_raw and from
> > jiffies will be ordered correctly in SMP. I am tempted to think that
> > ordering memory writes to clock->cycle_raw vs jiffies is also needed in this
> > case (where clock->cycle_raw is updated, or where jiffies is updated).
> >
> > We can fall in the same kind of issue if we read the HPET, which is
> > memory I/O based. It does not seems correct to assume that MMIO vs
> > normal memory reads are ordered. (pointing back to this article :
> > http://lwn.net/Articles/198988/)
>
> That and the dread memory barrier thread that my head is still spinning
> on.
>
> Ok, lets take a close look at the code in question. I may be wrong, and if
> so, great, we can fix it.
>
> We have this in get_monotonic_cycles:
>
> {
> cycle_t cycle_now, cycle_delta, cycle_monotonic, cycle_last;
> do {
> cycle_monotonic = clock->cycle_monotonic;
> cycle_last = clock->cycle_last;
> cycle_now = clocksource_read(clock);
> cycle_delta = (cycle_now - cycle_last) & clock->mask;
> } while (cycle_monotonic != clock->cycle_monotonic ||
> cycle_last != clock->cycle_last);
> return cycle_monotonic + cycle_delta;
> }
>
> and this in clocksource.h
>
> static inline void clocksource_accumulate(struct clocksource *cs, cycle_t now)
> {
> cycle_t offset = (now - cs->cycle_last) & cs->mask;
> cs->cycle_last = now;
> cs->cycle_accumulated += offset;
> cs->cycle_monotonic += offset;
> }
>
> now is usually just a clocksource_read() passed in.
>
> The goal is to have clock_monotonic always return something that is
> greater than what was read the last time.
>
> Let's make a few assumptions now (for others to shoot them down). One
> thing is that we don't need to worry too much about MMIO, because we are
> doing a read. This means we need the data right now to contiune. So the
> read being a function call should keep gcc from moving stuff around, and
> since we are doing an IO read, the order of events should be pretty much
> synchronized. in
>
> 1. load cycle_last and cycle_monotonic (we don't care which order)*
> 2. read clock source
> 3. calculate delta and while() compare (order doesn't matter)
>
> * we might care (see below)
>
> If the above is incorrect, then we need to fix get_monotonic_cycles.
>
> in clocksource_accumulate, we have:
>
> offset = ((now = cs->read()) - cycle_last) & cs->mask;
> cycle_last = now;
> cycle_accumulate += offset;
> cycle_monotonic += offset;
>
> The order of events here are. Using the same reasoning as above, the read
> must be first and completed because for gcc it's a function, and for IO,
> it needs to return data.
>
> 1. cs->read
> 2. update cycle_last, cycle_accumulate, cycle_monotonic.
>
> Can we assume, if the above for get_monotonic_cycles is correct, that
> since we read and compare cycle_last and cycle_monotonic, that neither of
> them have changed over the read? So we have a snapshot of the
> clocksource_accumulate.
>
> So the worst thing that I can think of, is that cycle_monotonic is update
> *before* cycle_last:
>
> cycle_monotonic += offest;
> <get_monotonic_cycles run on other CPU>
> cycle_last = now;
>
>
> cycle_last = 5
> cycle_monotonic = 0
>
>
> CPU 0 CPU 1
> ---------- -------------
> cs->read() = 10
> offset = 10 - 5 = 5
> cycle_monotonic = 5
> cycle_monotonic = 5
> cycle_last = 5
> cs->read() = 11
> delta = 11 - 5 = 6
> cycle_monotonic and cycle_last still same
> return 5 + 6 = 11
>
> cycle_last = 10
>
> cycle_monotonic = 5
> cycle_last = 10
> cs->read() = 12
> delta = 12 - 10 = 2
> cycle_monotonic and cycle_last still same
> return 5 + 2 = 7
>
> **** ERROR *****
>
> So, we *do* need memory barriers. Looks like cycle_last and
> cycle_monotonic need to be synchronized.
>
> OK, will this do?
>
> cycle_t notrace get_monotonic_cycles(void)
> {
> cycle_t cycle_now, cycle_delta, cycle_monotonic, cycle_last;
> do {
> cycle_monotonic = clock->cycle_monotonic;
> smp_rmb();
> cycle_last = clock->cycle_last;
> cycle_now = clocksource_read(clock);
> cycle_delta = (cycle_now - cycle_last) & clock->mask;
> } while (cycle_monotonic != clock->cycle_monotonic ||
> cycle_last != clock->cycle_last);
> return cycle_monotonic + cycle_delta;
> }
>
> and this in clocksource.h
>
> static inline void clocksource_accumulate(struct clocksource *cs, cycle_t now)
> {
> cycle_t offset = (now - cs->cycle_last) & cs->mask;
> cs->cycle_last = now;
> smp_wmb();
> cs->cycle_accumulated += offset;
> cs->cycle_monotonic += offset;
> }
>
> We may still get to a situation where cycle_monotonic is of the old value
> and cycle_last is of the new value. That would give us a smaller delta
> than we want.
>
> Lets look at this, with a slightly different situation.
>
> cycle_last = 5
> cycle_monotonic = 0
>
>
> CPU 0 CPU 1
> ---------- -------------
> cs->read() = 10
> offset = 10 - 5 = 5
> cycle_last = 10
> cycle_monotonic = 5
>
> cycle_monotonic = 5
> cycle_last = 10
> cs->read() = 12
> delta = 12 - 10 = 2
> cycle_monotonic and cycle_last still same
> return 5 + 2 = 7
>
>
> cs->read() = 13
> offset = 13 - 10 = 2
> cycle_last = 13
>
> cycle_monotonic = 5
> cycle_last = 13
> cs->read() = 14
> delta = 14 - 13 = 1
> cycle_monotonic and cycle_last still same
> return 5 + 1 = 6
>
> **** ERROR ****
>
> Crap, looks like we do need a stronger locking here :-(
>
> Hmm, I might as well just use seq_locks, and make sure that tracing
> does not hit them.
>
> Thanks!
>
> -- Steve
>

--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/