Re: [PATCH RFC] x86/tsc: Make recalibration default on for TSC_KNOWN_FREQ cases

From: Feng Tang
Date: Mon May 22 2023 - 04:54:42 EST


Hi Thomas,

Thanks for the review!

On Mon, May 22, 2023 at 10:14:08AM +0200, Thomas Gleixner wrote:
> On Mon, May 22 2023 at 11:30, Feng Tang wrote:
> > Commit a7ec817d5542 ("x86/tsc: Add option to force frequency
> > recalibration with HW timer") was added to handle cases that the
> > firmware has bug and provides a wrong TSC frequency number, and it
> > is optional given that this kind of firmware issue rarely happens
> > (Paul reported once [1]).
> >
> > But Rui reported that some Sapphire Rapids platform met this issue
> > again recently, and as firmware is also a kind of 'software' which
> > can't be bug free, make the recalibration default on. When the
> > values from firmware and HW timer's calibration have big gap,
> > raise a warning and let vendor to check which side is broken.
>
> Sure firmware can have bugs, but if firmware validation does not even
> catch such a trivially to detect bug, then their validation is nothing
> else than rubber stamping. Seriously.

Yes, agree.

> Are any of these affected platforms shipping already or is this just
> Intel internal muck?

Paul and Rui can provide more info. AFAIK, those problems were raised
by external customers, so the platform were already shipped from
Intel. But I'm not sure they are commercial versions or early
engineering drops.

>
> > One downside is, many VMs also has X86_FEATURE_TSC_KNOWN_FREQ set,
> > and they will also do this recalibration.
>
> It's also pointless for those SoCs which lack legacy hardware.

Yes.

> So why do you force this on everyone?

How about we keep the optional parameter, and enforce the check for
bare metal platforms which got TSC frequency info from CPUID(0x15),
like:

---
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 344698852146..ec1ff6aaf5a9 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -670,8 +670,10 @@ unsigned long native_calibrate_tsc(void)
* frequency and is the most accurate one so far we have. This
* is considered a known frequency.
*/
- if (crystal_khz != 0)
+ if (crystal_khz != 0) {
setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
+ tsc_force_recalibrate = 1;
+ }

/*
* Some Intel SoCs like Skylake and Kabylake don't report the crystal

Thanks,
Feng

> Thanks,
>
> tglx