Re: [PATCH 0/4] x86/platform/UV: Update TSC support

From: Mike Travis
Date: Fri Sep 29 2017 - 11:19:30 EST




On 9/29/2017 1:46 AM, Peter Zijlstra wrote:
On Thu, Sep 28, 2017 at 01:03:39PM -0500, mike.travis@xxxxxxx wrote:

The UV BIOS goes to considerable effort to get the TSC synchronization
accurate across the entire system. Included in that are multiple chassis
that can have 32+ sockets. The architecture does support an external
high resolution clock to aid in maintaining this synchronization.

The resulting TSC accuracy set by the UV system BIOS is much better
than the generic kernel TSC ADJUST functions. This is important for
applications that read the TSC values directly for accessing data bases.

* These patches disable an assumption made by the kernel tsc sync
functions that Socket 0 in the system should have a TSC ADJUST
value of zero. This is not correct when the chassis are reset
asynchronously to each other so which TSC's should be zero is
not predictable.

* When the system BIOS determines that the TSC is not stable, it then
sets a flag so the UV kernel setup can set the "tsc is unstable"
flag. A patch now prevents the kernel from attempting to fix the
TSC causing a slew of warning messages.

* It also eliminates another avalanche of warning messages from older
BIOS that did not have the TSC ADJUST MSR (ex. >3000 msgs in a 32
socket Skylake system). It now notes this with a single warning
message and then moves on with fixing them.

So I would still like to get clarification on how ART works (or likely
doesn't) on your systems. I think for now its fairly prudent to kill
detect_art() on UV.

I tested with both detect_art enabled and disabled and didn't notice a difference though I wasn't sure what test to run to verify whether it was being used or not. (I'd be glad to run some specific test if one can be suggested?) The num/denom setting for a 2100Mhz CPU was 168/2 if that information helps?

Also, while indeed not strictly required, that TSC_ADJUST==0 test on
bootcpu is nice for consumer systems, BIOS did something 'weird' if that
is not true. Is something like is_uv_system() available early enough?

My previous version of the patches had me setting a flag that could be checked by the tsc_sanitize_first_cpu() function and disable the requirement of "TSC == 0 on socket 0" for any arch that specified it.
(And UV did set that flag.)

But Thomas said it was "hackery" and that TSC being 0 on socket 0 was no longer a requirement. So I took it out for this version and made the "TSC == 0 on socket 0" no longer the default for any arch.

Other than that, the patches look good to me.