[PATCH 3.1?] x86: Remove useless stts/clts pair in __switch_to

From: Andy Lutomirski
Date: Mon Jul 25 2011 - 06:06:14 EST


An stts/clts pair takes over 70 ns by itself on Sandy Bridge, and
when other things are going on it's apparently even worse. This
saves 10% on context switches between threads that both use extended
state.

Signed-off-by: Andy Lutomirski <luto@xxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Arjan van de Ven <arjan@xxxxxxxxxxxxx>,
Cc: Avi Kivity <avi@xxxxxxxxxx>
---

This is not as well tested as it should be (especially on 32-bit, where
I haven't actually tried compiling it), but I think this might be 3.1
material so I want to get it out for review before it's even more
unjustifiably late :)

Argument for inclusion in 3.1 (after a bit more testing):
- It's dead simple.
- It's a 10% speedup on context switching under the right conditions [1]
- It's unlikely to slow any workload down, since it doesn't add any work
anywwhere.

Argument against:
- It's late.

[1] https://gitorious.org/linux-test-utils/linux-clock-tests/blobs/master/context_switch_latency.c

arch/x86/include/asm/i387.h | 10 ++++++++++
arch/x86/kernel/process_32.c | 10 ++++------
arch/x86/kernel/process_64.c | 7 +++----
3 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index c9e09ea..9d2d08b 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -295,6 +295,16 @@ static inline void __unlazy_fpu(struct task_struct *tsk)
tsk->fpu_counter = 0;
}

+static inline void __unlazy_fpu_clts(struct task_struct *tsk)
+{
+ if (task_thread_info(tsk)->status & TS_USEDFPU) {
+ __save_init_fpu(tsk);
+ } else {
+ tsk->fpu_counter = 0;
+ clts();
+ }
+}
+
static inline void __clear_fpu(struct task_struct *tsk)
{
if (task_thread_info(tsk)->status & TS_USEDFPU) {
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index a3d0dc5..c707741 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -304,7 +304,10 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
*/
preload_fpu = tsk_used_math(next_p) && next_p->fpu_counter > 5;

- __unlazy_fpu(prev_p);
+ if (preload_fpu)
+ __unlazy_fpu_clts(prev_p);
+ else
+ __unlazy_fpu(prev_p);

/* we're going to use this soon, after a few expensive things */
if (preload_fpu)
@@ -348,11 +351,6 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT))
__switch_to_xtra(prev_p, next_p, tss);

- /* If we're going to preload the fpu context, make sure clts
- is run while we're batching the cpu state updates. */
- if (preload_fpu)
- clts();
-
/*
* Leave lazy mode, flushing any hypercalls made here.
* This must be done before restoring TLS segments so
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index b1f3f53..272bddd 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -419,11 +419,10 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
load_TLS(next, cpu);

/* Must be after DS reload */
- __unlazy_fpu(prev_p);
-
- /* Make sure cpu is ready for new context */
if (preload_fpu)
- clts();
+ __unlazy_fpu_clts(prev_p);
+ else
+ __unlazy_fpu(prev_p);

/*
* Leave lazy mode, flushing any hypercalls made here.
--
1.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/