Re: can TSC tick with different speeds on SMP?

From: Grzegorz Kulewski
Date: Mon Jun 21 2004 - 14:41:02 EST


On Mon, 21 Jun 2004, [koi8-r] "Kirill Korotaev[koi8-r] " wrote:

> Hello,
>
> I've got some stupid question to SMP gurus and would be very thankful for the details. I suddenly faced an SMP system where different P4 cpus were installed (with different steppings). This resulted in different CPU clock speeds and different speeds of time stamp counters on these CPUs. I faced the problem during some timings I measured in the kernel.

IANSG (= I am not SMP guru).

> So the question is "is such system compliant with SMP specification?".
> In old kernels there was a code to syncronize TSCs and to detect if they were screwed up. Current kernels do not have such code. Is it intentional? I suppose there is some code in kernel which won't work find on such systems (real-time threads timing accounting and so on).

I have friend. He has SMP system with 2x P III. He put two different CPUs
into it (on 700 and one 750). This resulted in fast growing difference
between time on them. This can be potentially dangerous to databases or
other time driven programs. But there is a fix: boot with notsc kernel
parameter. But there is one problem. Everyting is ok as long as you have
i386 (that means NOT i686) compiled glibc. If you have not, init will stop
with segmentation fault or something like that. But we are using Gentoo,
so his glibc was i686 compiled so not working. He do not wanted to
recompile it to i386 so I fixed glibc to work with notsc and i686. But I
will not provide any patch because I fixed it against some CVS version
that Gentoo used at this time. Maybe he has more recent patch. I will ask
him and I will post patch if he has any. But the change is very simple.
You should grep glibc's sources against tsc related assembly instructions
and then kill them in the smart way. I found two places. One was in some
malloc debuging code and the other was in header. The header is simple to
fix - you should overwrite it with its no-op version from i386 (or
something like that) directory. And you should disable the second call in
the malloc debuging code (for example by setting define that you have no
precise timing source in your machine). Then you should recompile your
glibc and you are done. My friend's server has something about 2 moths
uptime and no clock problems so this is probably The Right Fix (TM). I am
writing this words using it so I know...


Grzegorz Kulewski

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/