Re: [PATCH REPOST^3] Run IST traps from user mode preemptive onprocess stack

From: Ingo Molnar
Date: Tue May 06 2008 - 10:34:36 EST



* Andi Kleen <andi@xxxxxxxxxxxxxx> wrote:

> > This introduces two security bugs in one go:
> >
> > 1.) The IST stack pointer is elevated unconditionaly and with your
> > change we can now schedule away in the handler.
>
> Yes, but that's fine.

no, your patch is not fine at all. It introduces a security hole, as
Thomas pointed it out to you.

The IST pointer in the _TSS_ can go out of bounds by your patch! That
corruption can be controlled by malicious userspace. Read the analysis
from Thomas.

That percpu_tss.ist[] location is not just a random pointer. It is the
TSS that is read by the CPU on every trap. It is a security-critical
structure and every modification to it has to be treated very, very
carefully. If it goes out of bounds that leads to very nasty problems.

This is a pretty bad security bug, and your reply shows ignorance about
the code your patch is modifying - _after_ Thomas has pointed it out to
you.

> > this same CPU triggers the same trap and elevates it again.
>
> That's not possible generally. None of these interrupts can nest in a
> normal kernel.

read the analysis from Thomas. Here is a sample buggy sequence:

task #1 runs, does int3, goes on DEBUG_STACK IST stack,
we do -= 4096 of the tss.ist[DEBUG_STACK-1] pointer, handler schedules
task #2 runs, does int3, goes on DEBUG_STACK IST stack, handler schedules
we do -= 4096 of the tss.ist[DEBUG_STACK-1] pointer, handler schedules
task #3 runs, does int3, goes on DEBUG_STACK IST

=> poof, stack underflow, memory corruption of nearby data structures,
because the DEBUG_STACK is only 8KB...

this can be triggered in almost arbitrary depth, from user-space.

> Do you refer to the DEBUG_STACK ist add/dec? That is not needed for
> anything in tree to my knowledge.

Wrong. The pointer the subq/addq instructions modify are in fact used by
_EVERY SINGLE_ IST trap sequence that the 64-bit kernel executes (!).

This is the modification that the paranoidentry macro does to the TSS in
entry_64.S:

subq $EXCEPTION_STKSZ, per_cpu__init_tss + TSS_ist + (\ist - 1) * 8(%rbp)
...
addq $EXCEPTION_STKSZ, per_cpu__init_tss + TSS_ist + (\ist - 1) * 8(%rbp)

that modifies the TSS _directly_. The TSS is directly read by the CPU on
every trap entry. The percpu_tss.ist[] array of pointers is then used by
the CPU for every single IST trap. (hw-debug, NMI, int3, double-fault,
stacksegment-overflow, MCE)

This is nothing new or unexpected, it is basic x86-64 IST architectural
behavior implemented in the CPU.

The bug you introduce is that if the handler schedules away (it blocks
or gets preempted on CONFIG_PREEMPT), it would keep the per-CPU IST
offset for that IST type decreased by 4096 => if done enough times the
pointer goes out of bounds and it's kaboom.

> > 2.) The IST stack pointer is unbalanced in the other direction as
> > well: it is incremented on CPUn then the handler is scheduled away
> > and migrated to CPUm. After return from the handler the IST stacks
> > of both CPUs are corrupted. Exploitable by unprivileged user-space
> > code as well.
>
> The IST is restored again after the handler. [...]

yes but it is restored on the wrong CPU, after the task has scheduled
and migrated - moving the IST pointer in the TSS out of bounds, so the
next IST trap (which user-space can trigger arbitrarily - just generate
a stream of INT3 breakpoints) will corrupt memory!

> [...] You're right that strictly wouldn't be needed and could be
> avoided, but i don't think it's exploitable in any ways.
>
> Regarding user controlled pt_regs: I think you're forgetting that
> x86-64 doesn't have vm86 mode and that we can always trust pt_regs
> segment indexes. On i386 you would be right, but not here.

Thomas is not forgetting anything.

Thomas is doing security analysis about how the hole in your patch can
be exploited: the wrong IST offsets are used to corrupt nearby data
structures - a malicious nonprivileged user task can modify a good
portion of the ~64 bytes of pt_regs at the end of every page near the
IST stack address (and randomly corrupt some bytes below that) - just by
virtue of generating an int3 (or hw-debug) trap with prepared register
contents.

You first need to understand the hole you are introducing to understand
the finer aspects of his security analysis though...

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/