Re: [PATCH] x86/tdx: Mark TSC reliable

From: Sean Christopherson
Date: Fri Aug 25 2023 - 11:17:18 EST


On Fri, Aug 25, 2023, Kirill A. Shutemov wrote:
> On Thu, Aug 24, 2023 at 09:31:39PM +0200, Thomas Gleixner wrote:
> > On Tue, Aug 08 2023 at 23:01, Kirill A. Shutemov wrote:
> > > On Tue, Aug 08, 2023 at 10:13:05AM -0700, Dave Hansen wrote:
> > >> On 8/8/23 09:23, Kirill A. Shutemov wrote:
> > >> ...
> > >> > On the other hand, other clock sources (such as HPET, ACPI timer,
> > >> > APIC, etc.) necessitate VM exits to implement, resulting in more
> > >> > fluctuating measurements compared to TSC. Thus, those clock sources
> > >> > are not effective for calibrating TSC.
> > >>
> > >> Do we need to do anything to _those_ to mark them as slightly stinky?
> > >
> > > I don't know what the rules here. As far as I can see, all other clock
> > > sources relevant for TDX guest have lower rating. I guess we are fine?
> >
> > Ideally they are not enumerated in the first place, which prevents the
> > kernel from trying.
>
> We can ask QEMU/KVM not to advertise them to TDX guest, but guest has to
> protect itself as the VMM is not trusted. And we are back to device
> filtering...
>
> > > There's notable exception to the rating order is kvmclock which is higher
> > > than tsc.
> >
> > Which is silly aside of TDX.

It made a lot more sense back when stable TSC weren't a thing, which is why
kvmclock_init() drops its rating below the TSC's "300" rating when the TSC is
stable and nonstop.

if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
!check_tsc_unstable())
kvm_clock.rating = 299;

Xen and Hyper-V also have a paravirt clock with a rating that is initially higher
than the TSC, but xen_time_init() and hv_init_tsc_clocksource() have similar behavior
to lower their rating when the TSC is deemed to be safe/stable.

Note, because KVM clock isn't marked VALID_FOR_HRES, even if kvmclock didn't
drop its rating, most guests will end up selecting the TSC anyways.

> > > It has to be disabled, but it is not clear to me how. This topic
> > > is related to how we are going to filter allowed devices/drivers, so I
> > > would postpone the decision until we settle on wider filtering schema.
> >
> > TDX aside it might be useful to have a mechanism to select TSC over KVM
> > clock in general.
>
> Sean, Paolo, any comment on this?

I would expect the VMM to not advertise KVM clock if the VM is going to run on
hosts with stable TSCs, i.e. the guest really shouldn't need to do anything in.
But I avoid clocks and timekeeping like the plague, so take that with a grain of
salt, e.g. maybe there's a good reason to always advertise kvmclock.

For TDX and other paranoid guests, I assume the kernel command line is captured
as part of attestation. And so the existing "no-kvmclock" param should be
sufficient to ensure the guest doesn't use KVM clock over the TSC, though IIRC
TDX requires a constant, non-stop TSC, so it's likely not strictly necessary.