Re: [PATCH 09/10] x86-32: use SSE for atomic64_read/set if available

From: H. Peter Anvin
Date: Wed Feb 17 2010 - 17:40:19 EST


On 02/17/2010 03:42 AM, Luca Barbieri wrote:
> This patch uses SSE movlps to perform 64-bit atomic reads and writes.
>
> According to Intel manuals, all aligned 64-bit reads and writes are
> atomically, which should include movlps.
>
> To do this, we need to disable preempt, clts if TS was set, and
> restore TS.
>
> If we don't need to change TS, using SSE is much faster.
>
> Otherwise, it should be essentially even, with the fastest method
> depending on the specific architecture.
>
> Another important point is that with SSE atomic64_read can keep the
> cacheline in shared state.
>
> If we could keep TS off and reenable it when returning to userspace,
> this would be even faster, but this is left for a later patch.
>
> We use SSE because we can just save the low part %xmm0, whereas using
> the FPU or MMX requires at least saving the environment, and seems
> impossible to do fast.
>
> Signed-off-by: Luca Barbieri <luca@xxxxxxxxxxxxxxxxx>

I'm a bit unhappy about this patch. It seems to violate the assumption
that we only ever use the FPU state guarded by
kernel_fpu_begin()..kernel_fpu_end() and instead it uses a local hack,
which seems like a recipe for all kinds of very subtle problems down the
line.

Unless the performance advantage is provably very compelling, I'm
inclined to say that this is not worth it.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/