Re: [RFC patch 0/4] TSC calibration improvements

From: Alok Kataria
Date: Fri Sep 05 2008 - 18:18:26 EST

Next message: Frans Pop: "Re: Regression: SATA disk double spin-off during hibernation on hp nx6325"
Previous message: Aaron Straus: "Re: [NFS] blocks of zeros (NULLs) in NFS files in kernels >= 2.6.20"
In reply to: Ingo Molnar: "Re: [RFC patch 0/4] TSC calibration improvements"
Next in thread: Linus Torvalds: "Re: [RFC patch 0/4] TSC calibration improvements"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, 2008-09-04 at 14:33 -0700, Ingo Molnar wrote:
> * Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> > On Thu, 4 Sep 2008, Ingo Molnar wrote:
> > >
> > > hm, unless i'm missing something i think here we still have a small
> > > window for an SMI or some virtualization delay to slip in and cause
> > > massive inaccuracy: if the delay happens _after_ the last
> > > pit_expect_msb() and _before_ the external get_cycles() call. Right?
> >
> > Yes. I had the extra pit_expect_msb() originally, but decided that
> > basically a single-instruction race for somethign that ran without any
> > MSI for 15ms was a bit pointless.
>
> the race is wider than that i think: all it takes an SMI at the last PIO
> access, so the window should be 1 usec, against a 15000 usecs period.
> That's 1 out of 15,000 boxes coming up with totally incorrect
> calibration.
>
> we also might have a very theoretical race of an SMI taking exactly 65
> msecs so that the whole PIT wraps around and fools the fastpath - the
> chance for that would be around 1:300 - assuming we only have to hit the
> right MSB with a ~200 usecs precision). That assumes equal distribution
> of SMI costs which they certainly dont have - most of them are much less
> than 60 msecs. So i dont think it's an issue in practice - on real hw.
>
> But it's still a possibility unless i'm missing something. We could
> protect against that case by reading the IRQ0-pending bit and making
> sure it's not pending after we have done the closing TSC readout.
>
> > But adding another pit_expect_msb() is certainly not wrong.
>

Hi,
I ran the current tree with these patches on my VM setup for both 32 &
64bit around 200 reboots each.
The system entered the FAST calibration mode more often this time,
around 25% of time.
And i had an interesting case where in the frequency that was calibrated
was 1875Mhz compared to actual ~1866Mhz, leaving an error of 0.5%.

Now, looking at the code.
Even with this last pit_expect_msb check, i think there can be a case
when a error spanning 114usec can slip in the TSC calculation.

This can happen if,
in the pit_expect_msb (the one just before the second read_tsc),
we hit an SMI/virtualization event *after* doing the 50 iterations of
PIT read loop, this allows the pit_expect_msb to succeed when the SMI
returns.

If this SMI/Virtualization event spans across the next PIT MSB increment
interval leaving sufficient time (100us) for the last pit_expect_msb to
succeed.
We can have a error of 1MSB tick increment - time taken for the last
pit_expect_msb to succeed, in the read TSC value.

i.e. a error of (214us - 100us) in the 15msec period, i.e. error of
7600PPM ??

And, in order for the TSC clocksource to keep correct time (on systems
where the TSC clocksource is usable), the TSC frequency estimate must be
within 500 ppm of its true frequency, otherwise NTP will not be able to
correct it.

So, IMHO we should not use this algorithm.

I don't know if increasing the count threshold will help too, since that
threshold value may fail for some system which perform better than our
assumption of "we take 2us to do the 2 PIT reads". Atleast in
virtualized environment I can make no such guarantees.

Thanks,
Alok

> ok, i kept that bit.
>
> Ingo

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Frans Pop: "Re: Regression: SATA disk double spin-off during hibernation on hp nx6325"
Previous message: Aaron Straus: "Re: [NFS] blocks of zeros (NULLs) in NFS files in kernels >= 2.6.20"
In reply to: Ingo Molnar: "Re: [RFC patch 0/4] TSC calibration improvements"
Next in thread: Linus Torvalds: "Re: [RFC patch 0/4] TSC calibration improvements"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]