Re: [GIT PULL] x86/asm changes for v5.6

From: Borislav Petkov
Date: Wed Jan 29 2020 - 08:26:33 EST


On Tue, Jan 28, 2020 at 12:06:53PM -0800, Linus Torvalds wrote:
> On Tue, Jan 28, 2020 at 11:51 AM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > ALTERNATIVE_2 \
> > "cmp $680, %rdx ; jb 3f ; cmpb %dil, %sil; je 4f", \
> > "movq %rdx, %rcx ; rep movsb; retq", X86_FEATURE_FSRM, \
> > "cmp $0x20, %rdx; jb 1f; movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS
>
> Note the UNTESTED part.
>
> In particular, I didn't check what the priority for the alternatives
> is. Since FSRM being set always implies ERMS being set too, it may be
> that the ERMS case is always picked with the above code.
>
> So maybe the FSRM and ERMS lines need to be switched around, and
> somebody should add a comment to the ALTERNATIVE_2 macro about the
> priority rules for feature1 vs feature2 when both are set..
>
> IOW, testing most definitely required for that patch suggestion of mine..

So what is there now before your patch is this (I've forced both
X86_FEATURE_FSRM and X86_FEATURE_ERMS on a BDW guest).

[ 4.238160] apply_alternatives: feat: 18*32+4, old: (__memmove+0x17/0x1a0 (ffffffff817d90d7) len: 10), repl: (ffffffff8251dbbb, len: 0), pad: 0
[ 4.239503] ffffffff817d90d7: old_insn: 48 83 fa 20 0f 82 f5 00 00 00

That's what in vmlinux:

ffffffff817d90d7: 48 83 fa 20 cmp $0x20,%rdx
ffffffff817d90db: 0f 82 f5 00 00 00 jb ffffffff817d91d6

which is 10 bytes.

It gets replaced to:

[ 4.240194] ffffffff817d90d7: final_insn: 0f 1f 84 00 00 00 00 00 66 90

ffffffff817d90d7: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
ffffffff817d90de: 00
ffffffff817d90df: 66 90 xchg %ax,%ax

I.e., NOPed out.

ERMS replaces the bytes *after* these 10 bytes, note the VA:

0xffffffff817d90d7 + 0xa = 0xffffffff817d90e1

[ 4.240917] apply_alternatives: feat: 9*32+9, old: (__memmove+0x21/0x1a0 (ffffffff817d90e1) len: 6), repl: (ffffffff8251dbbb, len: 6), pad: 6
[ 4.242209] ffffffff817d90e1: old_insn: 90 90 90 90 90 90
[ 4.242823] ffffffff8251dbbb: rpl_insn: 48 89 d1 f3 a4 c3
[ 4.243503] ffffffff817d90e1: final_insn: 48 89 d1 f3 a4 c3

which turns into

ffffffff817d90e1: 48 89 d1 mov %rdx,%rcx
ffffffff817d90e4: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
ffffffff817d90e6: c3 retq

as expected.

And yes, your idea makes sense to use ALTERNATIVE_2 but as it is, it
triple-faults my guest. I'll debug it more later to find out why, when I
get a chance.

--
Regards/Gruss,
Boris.

SUSE Software Solutions Germany GmbH, GF: Felix ImendÃrffer, HRB 36809, AG NÃrnberg