Re: [PATCH] Guest system time jumps when new vCPUs is hot-added

From: Zelin Deng
Date: Thu Apr 29 2021 - 18:40:11 EST


Got it. Many thanks, Thomas.

On 2021/4/30 上午12:02, Thomas Gleixner wrote:

On Thu, Apr 29 2021 at 17:38, Zelin Deng wrote:
On 2021/4/29 下午4:46, Thomas Gleixner wrote:
And that validation expects that the CPUs involved run in a tight loop
concurrently so the TSC readouts which happen on both can be reliably
compared.

But this cannot be guaranteed on vCPUs at all, because the host can
schedule out one or both at any point during that synchronization
check.
Is there any plan to fix this?
The above cannot be fixed.

As I said before the solution is:

A two socket guest setup needs to have information from the host that
TSC is usable and that the socket sync check can be skipped. Anything
else is just doomed to fail in hard to diagnose ways.
Yes, I had tried to add "tsc=unstable" to skip tsc sync.  However if a
tsc=unstable? Oh well.

user process which is not pined to vCPU is using rdtsc, it can get tsc
warp, because it can be scheduled among vCPUs.  Does it mean user
Only if the hypervisor is not doing the right thing and makes sure that
all vCPUs have the same tsc offset vs. the host TSC.

applications have to guarantee itself to use rdtsc only when TSC is
reliable?
If the TSCs of CPUs are not in sync then the kernel does the right thing
and uses some other clocksource for the various time interfaces, e.g.
the kernel provides clock_getttime() which guarantees to be correct
whether TSC is usable or not.

Any application using RDTSC directly is own their own and it's not a
kernel problem.

The host kernel cannot make guarantees that the hardware is sane neither
can a guest kernel make guarantees that the hypervisor is sane.

Thanks,

tglx