Re: gettimeofday non-monotonic on SMP 2.3.47

From: Boris Okun (bokun@home.com)
Date: Thu Mar 02 2000 - 15:11:28 EST


Andrea Arcangeli wrote:
>
....

> >Andrea, could you explain it to me? I think this code really belongs to
> >do_fast_gettimeoffset() for the following 2 reasons:
> >1) When using do_slow_gettimeoffset() it's erroneus.
>
> do_slow_gettimeoffset is never erroneus, but it's slower and less precise.

I don't mean do_slow_gettimeoffset is erroneous, I mean doing lost_ticks
after it is wrong. See below.

> So on TSC capable hardware you really prefer to not use it but to use the
> TSC at get-time time (instead of doing sloww I/O at each get-time call).
>
> >2) You want to take lost_ticks into account in do_settimeofday() (when
> >undoing do_gettimeoffset().
>
> We just do that as far I can tell. See do_settimeofday() that calls
> do_gettimeoffset().

Perhaps I was not clear. do_settimeofday() does call
do_gettimeoffset(), but lost_ticks code is not called, it's in the
do_gettimeofday().
I think IA64 has this right and IA32 has it wrong.

>
> >To solve my problem I made the following changes (thanks to Artur
> >Skawina):
> >1) Made tsc_quotient[NR_CPUS] a per processor value
>
> So far the rule is been that on IA32 SMP machines the TSC goes at the same
> speed on all CPUs. If that's not longer true not only do_gettimeoffset
> will break. If the TSC runs at different speed userspace will break too.
> Things like fftw uses the TSC for doing faster runtime benchmarking to

What is fftw?

> dynamically tune the algorithms.
>
> And anyway that change is a noop for your problem (see below).
>
> >2) Move calls to calibrate_tsc() to the same places where we call
> >calibrate_delay
>
> Since you said you have per-cpu tsc_quotient you could as well skip
> calibrate_tsc, so this change meaningless w.r.t. gettimeofday.

Well, I need values for tsc_quotient's, so I get the by calling
calibrate_tsc
on different CPU's

>
> >2) Made last_tsc[NR_CPUS] a full 64 bit, instead of just last_tsc_low.
>
> That's useless since if the timer irq is not run for more than 10msec it's
> a bug and you would lose time anyway if that would happen.
>
> The CPU should run a 429ghz to overflow the low part of the tsc in 10msec

Whether tsc_low overflows in 10ms depends on it's value in the beginning
of this 10ms.

> so we still have a few years before we need to use the high part ;). On a
> mean 500mhz cpu it takes around 8 seconds to overflow the low tsc a
> 8seconds is more than enough for IA32 CPUs.
>

But my point is that I sometimes see delta_tsc=tsc_current-tsc_last of
order 10^7--10^8 and the current code does not get it right (Hmm, have
to think about this, confused).

> >3) Made delay_at_last_interrupt[NR_CPUS]
>
> That seems wrong. delay_at_last_interrupt is a per-timer-chip thing and
> there's _only_ one timer chip not one chip per CPU.

Yes, you are right. In fact I did not make that change. Sorry, I am
writing from memory.

>
> >4) Deleted lost_ticks code from do_gettimeofday. Use instead the full
> >difference
> >between current_tsc and last_tsc in do_fast_gettimeoffset(), not just
> >LSB.
>
> That's wrong. lost_ticks accounts the time that is going to be added to
> xtime but that's not been yet added because of bh inibithed at timer irq
> time. At irq time you overwritten last_tsc and with only last_tsc you
> can't calculate lost_ticks. So you can get wrong result out of
> gettimeofday this way (if the timer irq gets delayed a bit more).

Need to think about this.

> I guess the fact you handle delay_at_last_interrupt[NR_CPUS] wrong, had
> the side effect of hiding the probelm.

No I don't, see above.
I'll run some more tests when I get home later today.
I have no problems in UP stock 2.3.47.

My current theory is that lost_ticks are wrong with get_slow_timeoffset.
That would explains why your first patch did not work.
And for per processor tsc's I'll need per processor lost_ticks.
That would explain why disabling global lost_ticks makes things better.

Thanks,

Boris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Mar 07 2000 - 21:00:13 EST