Re: [PATCH] x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue

From: Borislav Petkov
Date: Mon Apr 27 2015 - 15:12:10 EST


On Mon, Apr 27, 2015 at 08:38:54PM +0200, Borislav Petkov wrote:
> I'm running them now and will report numbers relative to the last run
> once it is done. And those numbers should in practice get even better if
> we revert to the simpler canonical-ness check but let's see...

Results are done. New row is F: which is with the F16h NOPs.

With all things equal and with this change ontop:

---
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index aef653193160..d713080005ef 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -227,6 +227,14 @@ void __init arch_init_ideal_nops(void)
#endif
}
break;
+
+ case X86_VENDOR_AMD:
+ if (boot_cpu_data.x86 == 0x16) {
+ ideal_nops = p6_nops;
+ return;
+ }
+
+ /* fall through */
default:
#ifdef CONFIG_X86_64
ideal_nops = k8_nops;
---

... cycles, instructions, branches, branch-misses, context-switches
drop or remain roughly the same. BUT(!) timings increases.
cpu-clock/task-clock and duration of the workload are all the worst of
all possible cases.

So either those NOPs are not really optimal (i.e., trusting the manuals
and so on :-)) or it is their alignment.

But look at the chapter in the manual - "2.7.2.1 Encoding Padding for
Loop Alignment" - those NOPs are supposed to be used as padding so
they themselves will not be necessarily aligned when you use them to pad
stuff.

Or maybe using the longer NOPs is probably worse than the shorter 4-byte
ones with 3 0x66 prefixes which should "flow" easier through the pipe
due to their smaller length.

Or something completely different...

Oh well, enough measurements for today - will do the rc1 measurement
tomorrow.

Thanks.

---
Performance counter stats for 'system wide' (10 runs):

A: 2835570.145246 cpu-clock (msec) ( +- 0.02% ) [100.00%]
B: 2833364.074970 cpu-clock (msec) ( +- 0.04% ) [100.00%]
C: 2834708.335431 cpu-clock (msec) ( +- 0.02% ) [100.00%]
D: 2835055.118431 cpu-clock (msec) ( +- 0.01% ) [100.00%]
E: 2833115.118624 cpu-clock (msec) ( +- 0.06% ) [100.00%]
F: 2835863.670798 cpu-clock (msec) ( +- 0.02% ) [100.00%]

A: 2835570.099981 task-clock (msec) # 3.996 CPUs utilized ( +- 0.02% ) [100.00%]
B: 2833364.073633 task-clock (msec) # 3.996 CPUs utilized ( +- 0.04% ) [100.00%]
C: 2834708.350387 task-clock (msec) # 3.996 CPUs utilized ( +- 0.02% ) [100.00%]
D: 2835055.094383 task-clock (msec) # 3.996 CPUs utilized ( +- 0.01% ) [100.00%]
E: 2833115.145292 task-clock (msec) # 3.996 CPUs utilized ( +- 0.06% ) [100.00%]
F: 2835863.719556 task-clock (msec) # 3.996 CPUs utilized ( +- 0.02% ) [100.00%]

A: 5,591,213,166,613 cycles # 1.972 GHz ( +- 0.03% ) [75.00%]
B: 5,585,023,802,888 cycles # 1.971 GHz ( +- 0.03% ) [75.00%]
C: 5,587,983,212,758 cycles # 1.971 GHz ( +- 0.02% ) [75.00%]
D: 5,584,838,532,936 cycles # 1.970 GHz ( +- 0.03% ) [75.00%]
E: 5,583,979,727,842 cycles # 1.971 GHz ( +- 0.05% ) [75.00%]
F: 5,581,639,840,197 cycles # 1.968 GHz ( +- 0.03% ) [75.00%]

A: 3,106,707,101,530 instructions # 0.56 insns per cycle ( +- 0.01% ) [75.00%]
B: 3,106,632,251,528 instructions # 0.56 insns per cycle ( +- 0.00% ) [75.00%]
C: 3,106,265,958,142 instructions # 0.56 insns per cycle ( +- 0.00% ) [75.00%]
D: 3,106,294,801,185 instructions # 0.56 insns per cycle ( +- 0.00% ) [75.00%]
E: 3,106,381,223,355 instructions # 0.56 insns per cycle ( +- 0.01% ) [75.00%]
F: 3,105,996,162,436 instructions # 0.56 insns per cycle ( +- 0.00% ) [75.00%]

A: 683,676,044,429 branches # 241.107 M/sec ( +- 0.01% ) [75.00%]
B: 683,670,899,595 branches # 241.293 M/sec ( +- 0.01% ) [75.00%]
C: 683,675,772,858 branches # 241.180 M/sec ( +- 0.01% ) [75.00%]
D: 683,683,533,664 branches # 241.154 M/sec ( +- 0.00% ) [75.00%]
E: 683,648,518,667 branches # 241.306 M/sec ( +- 0.01% ) [75.00%]
F: 683,663,028,656 branches # 241.078 M/sec ( +- 0.00% ) [75.00%]

A: 43,829,535,008 branch-misses # 6.41% of all branches ( +- 0.02% ) [75.00%]
B: 43,844,118,416 branch-misses # 6.41% of all branches ( +- 0.03% ) [75.00%]
C: 43,819,871,086 branch-misses # 6.41% of all branches ( +- 0.02% ) [75.00%]
D: 43,795,107,998 branch-misses # 6.41% of all branches ( +- 0.02% ) [75.00%]
E: 43,801,985,070 branch-misses # 6.41% of all branches ( +- 0.02% ) [75.00%]
F: 43,804,449,271 branch-misses # 6.41% of all branches ( +- 0.02% ) [75.00%]

A: 2,030,357 context-switches # 0.716 K/sec ( +- 0.06% ) [100.00%]
B: 2,029,313 context-switches # 0.716 K/sec ( +- 0.05% ) [100.00%]
C: 2,028,566 context-switches # 0.716 K/sec ( +- 0.06% ) [100.00%]
D: 2,028,895 context-switches # 0.716 K/sec ( +- 0.06% ) [100.00%]
E: 2,031,008 context-switches # 0.717 K/sec ( +- 0.09% ) [100.00%]
F: 2,028,132 context-switches # 0.715 K/sec ( +- 0.05% ) [100.00%]

A: 52,421 migrations # 0.018 K/sec ( +- 1.13% )
B: 52,049 migrations # 0.018 K/sec ( +- 1.02% )
C: 51,365 migrations # 0.018 K/sec ( +- 0.92% )
D: 51,766 migrations # 0.018 K/sec ( +- 1.11% )
E: 53,047 migrations # 0.019 K/sec ( +- 1.08% )
F: 51,447 migrations # 0.018 K/sec ( +- 0.86% )

A: 709.528485252 seconds time elapsed ( +- 0.02% )
B: 708.976557288 seconds time elapsed ( +- 0.04% )
C: 709.312844791 seconds time elapsed ( +- 0.02% )
D: 709.400050112 seconds time elapsed ( +- 0.01% )
E: 708.914562508 seconds time elapsed ( +- 0.06% )
F: 709.602255085 seconds time elapsed ( +- 0.02% )

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/