Re: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation

From: Jeremy Fitzhardinge
Date: Wed Oct 07 2009 - 15:30:48 EST


On 10/07/09 03:25, Avi Kivity wrote:
> On 10/06/2009 08:46 PM, Jeremy Fitzhardinge wrote:
>>
>>> Instead of using vgetcpu() and rdtsc() independently, you can use
>>> rdtscp to read both atomically. This removes the need for the preempt
>>> notifier.
>>>
>> rdtscp first appeared on Intel with Nehalem, so we need to support older
>> Intel chips.
>>
>
> We can support them by falling back to the kernel.

Yes, but its easy enough to support them with the fast-path.

> I'm a bit worried about the kernel playing with the hypervisor's
> version field.

For Xen I explicitly made it not a problem by adding the notion of a
secondary pvclock_vcpu_time_info structure which is updated by copying,
aside from the version number which is updated as-is.

As far as I can tell it isn't a problem for KVM either. The guest
version number is atomic with respect to preemption by the hypervisor so
there's no scope for racing. (The ABI already guarantees that the
pvclock structures are never updated cross-cpu.)

It ultimately doesn't matter what the version number is so long as it
changes when the parameters are updated, and version numbers can't be
reused within a window where things get confused.

> It's better to introduce yet a new version for the kernel, and check
> both.

Two version numbers are awkward to read atomically at least on 32-bit.
And I don't think its necessary.

> def try_pvclock_vtime():
> tsc, p0 = rdtscp()
> v0 = pvclock[p0].version
> tsc, p = rdtscp()
> t = pvclock_time(pvclock[p], tsc)
> if p != p0 or pvclock[p].version != v0:
> raise Exception("Processor or timebased change under our feet")
> return t
>
> def pvclock_time():
> while True:
> try:
> return try_pvlock_time()
> except:
> pass
>
> So, two rdtscps and two compares.

Yep, that would work.

> It's sufficient to increment a version counter on thread migration, no
> need to do it on context switch.
>

That's true; switch_out is a pessimistic approximation of that. But is
there a convenient hook to test for migration?

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/