Re: large, spurious[?] TSC skews on AMD 760MPX boards

From: Andi Kleen
Date: Tue Jul 27 2004 - 12:29:01 EST


xiphmont@xxxxxxxx (Monty) writes:

> Ever since getting my first dual Athlon, the system timer was 'not
> quite right' when running at stock speed. Selects, alarms, etc, had a
> strange way of firing fractions of a second or several seconds 'too
> late'. I discovered that overclocking by about 10% made the problem

That points away from the TSC actually. select and alarm use the jiffies
clock, which is managed by the PIT timer in the southbridge. AFAIK
they never rely on the TSC.

> magically go away. I've never been entirely comfortable doing that,
> but three dual athlons later (all 760MPX-B2 based boards of different
> makes), it was always the only way to make the problem disappear and I
> didn't think more about it.
>
> Now that I'm on #3, it is not stable at the overclock I need to make
> the system timer problem disappear, so I finally started hunting for
> the cause. Whenever I run the system stock, I see:
>
> Jul 20 21:48:26 Snotfish kernel: checking TSC synchronization across CPUs:
> Jul 20 21:48:26 Snotfish kernel: BIOS BUG: CPU#0 improperly initialized, has 6282588 usecs TSC skew! FIXED.
> Jul 20 21:48:26 Snotfish kernel: BIOS BUG: CPU#1 improperly initialized, has -6282588 usecs TSC skew! FIXED.
>
> When the system is running 'properly', that is to say, overclocked:
>
> Jul 21 22:08:01 Snotfish kernel: checking TSC synchronization across CPUs: passed.
>
> This behavior is reproducable on all three of my 760MPX systems (One
> Gigabyte GA-7DPXDW-P, and two MSI K7D Master-L). The amount of the
> reported skew varies in the stock case, but it's always large. Note
> that once in a blue moon, the system will come up with no TSC skew at
> stock timings, and the system timer issues seem to disappear.
>
> What is the proper route to go about debugging this problem, as I have
> it bottled up here in a reproducability cage?

Assuming it is the TSC:

You could write a multithreaded program that polls the TSCs
on your both CPU for a long time and check out the drift rate.
The kernel will try to fix it at boot time, but it cannot do that when the TSCs
are drifting later.

One way to work around it would be to boot with "notsc". This will
make your gettimeofday() slower and more inaccurate though.

In theory you could resync the TSC in a regularly running timer,
but this would probably be quite some overhead.

Assuming it is not:

Something is wrong with your PIT timer in the southbridge. Maybe
just run ntpd ?

I know that later AMD chipsets - in particular the 8111 - are somewhat
bad time keepers, which makes it a good idea to run NTP always.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/