Re: [PATCH] x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue

From: Andy Lutomirski
Date: Fri Apr 24 2015 - 16:21:32 EST


On Thu, Apr 23, 2015 at 7:15 PM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET
> with SS == 0 results in an invalid usermode state in which SS is
> apparently equal to __USER_DS but causes #SS if used.
>
> Work around the issue by replacing NULL SS values with __KERNEL_DS
> in __switch_to, thus ensuring that SYSRET never happens with SS set
> to NULL.
>
> This was exposed by a recent vDSO cleanup.
>
> Fixes: e7d6eefaaa44 x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
> ---
>
> Tested only on Intel, which isn't very interesting. I'll tidy up
> and send a test case, too, once Borislav confirms that it works.
>
> Please don't actually apply this until we're sure we understand the
> scope of the issue. If this doesn't affect SYSRETQ, then we might
> to fix it on before SYSRETL to avoid impacting 64-bit processes
> at all.
>

After sleeping on it, I think I want to offer a different, more
complicated approach. AFAIK there are really only two ways that this
issue can be visible:

1. SYSRETL. We can fix that up in the AMD SYSRETL path. I think
there's a decent argument that that path is less performance-critical
than context switches.

2. SYSRETQ. The only way that I know of to see the problem is SYSRETQ
followed by a far jump or return. This is presumably *extremely*
rare.

What if we fixed #2 up in do_stack_segment. We should double-check
the docs, but I think that this will only ever manifest as #SS(0) with
regs->ss == __USER_DS and !user_mode_64bit(regs). We need to avoid
infinite retry looks, but this might be okay. I think that #SS(0)
from userspace under those conditions can *only* happen as a result of
this issue. Even if not, we could come up with a way to only retry
once per syscall (e.g. set some ti->status flag in the 64-bit syscall
path on AMD and clear it in do_stack_segment).

This might be way more trouble than it's worth. For one thing, we
need to be careful with the IRET fixup. Ick. So maybe this should be
written off as my useless ramblings.

NB: I suspect that all of this is irrelevant on Xen. Xen does its own
thing wrt sysret.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/