Re: [PATCH 09/10] x86-32: use SSE for atomic64_read/set if available

From: H. Peter Anvin
Date: Thu Feb 18 2010 - 10:25:21 EST


On 02/18/2010 02:27 AM, Luca Barbieri wrote:
>> CR changes are slow and synchronize the CPU. The later is always slow.
>>
>> It sounds like you didn't time it?
> I didn't, because I think it strongly depends on the microarchitecture
> and I don't have a comprehensive set of machines to test on, so it
> would just be a single data point.
>
> The lock prefix on cmpxchg8b is also serializing so it might be as bad.

No. LOCK isn't serializing in the same way CRx writes are.


> Anyway, if we use this, we should keep TS cleared in kernel mode and
> lazily restore it on return to userspace.
> This would make clts/stts performance mostly moot.

This is what kernel_fpu_begin/kernel_fpu_end is all about. We
definitely cannot leave TS cleared without the user space CPU state
moved to its home location, or we have yet another complicated state to
worry about.

I really feel that without a *strong* use case for this, there is
absolutely no point.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/