Re: [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry

From: Thomas Gleixner
Date: Mon Sep 29 2014 - 14:58:48 EST


On Mon, 29 Sep 2014, Andy Lutomirski wrote:
> On 09/25/2014 12:42 PM, Anish Bhatt wrote:
> > The MSR_SYSCALL_MASK, which is responsible for clearing specific EFLAGS on
> > syscall entry, should also clear the nested task (NT) flag to be safe from
> > userspace injection. Without this fix the application segmentation
> > faults on syscall return because of the changed meaning of the IRET
> > instruction.
> >
> > Further details can be seen here https://bugs.winehq.org/show_bug.cgi?id=33275
> >
> > Signed-off-by: Anish Bhatt <anish@xxxxxxxxxxx>
> > Signed-off-by: Sebastian Lackner <sebastian@xxxxxxxxxxx>
> > ---
> > arch/x86/kernel/cpu/common.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> > index e4ab2b4..3126558 100644
> > --- a/arch/x86/kernel/cpu/common.c
> > +++ b/arch/x86/kernel/cpu/common.c
> > @@ -1184,7 +1184,7 @@ void syscall_init(void)
> > /* Flags to clear on syscall */
> > wrmsrl(MSR_SYSCALL_MASK,
> > X86_EFLAGS_TF|X86_EFLAGS_DF|X86_EFLAGS_IF|
> > - X86_EFLAGS_IOPL|X86_EFLAGS_AC);
> > + X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
>
> Something's weird here, and at the very least the changelog is
> insufficiently informative.
>
> The Intel SDM says:
>
> If the NT flag is set and the processor is in IA-32e mode, the IRET
> instruction causes a general protection exception.
>
> Presumably interrupt delivery clears NT. I haven't spotted where that's
> documented yet.

Nope, that's unrelated.

See Volume 3, chapter 7.4 "Task linking":

"The previous task link field of the TSS (sometimes called the
“backlink”) and the NT flag in the EFLAGS register are used to return
execution to the previous task. EFLAGS.NT = 1 indicates that the
currently executing task is nested within the execution of another
task.

When a CALL instruction, an interrupt, or an exception causes a task
switch: the processor copies the segment selector for the current TSS
to the previous task link field of the TSS for the new task; it then
sets EFLAGS.NT = 1. If software uses an IRET instruction to suspend
the new task, the processor checks for EFLAGS.NT = 1; it then uses the
value in the previous task link field to return to the previous
task. See Figures 7-8."

Now, Linux does not care about that. Thread management is done purely
in software. So nothing uses and nothing can use the TSS backlink and
NT mode.

In IA-32e mode a IRET seing EFLAGS.NT=1 will cause #GP. In non IA-32e
mode it would simply explode by returning to TSS.back_link, which is
reliably NULL.

So there is nothing to see here other than the stupid user space task
fiddling with the NT flag being killed rightfully.

Thanks,

tglx