Re: [PATCH v2 6/8] x86/entry: add unwind hint annotations

From: Andy Lutomirski
Date: Thu Jun 29 2017 - 18:59:12 EST




--Andy

> On Jun 29, 2017, at 2:41 PM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>
>> On Thu, Jun 29, 2017 at 02:09:54PM -0700, Andy Lutomirski wrote:
>>> On Thu, Jun 29, 2017 at 12:05 PM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>>>> On Thu, Jun 29, 2017 at 11:50:18AM -0700, Andy Lutomirski wrote:
>>>>> On Thu, Jun 29, 2017 at 10:53 AM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>>>>> There's a bug here that will need a small change to the entry code.
>>>>>
>>>>> Mike Galbraith reported:
>>>>>
>>>>> WARNING: can't dereference registers at ffffc900089d7e08 for ip ffffffff81740bbb
>>>>>
>>>>> After some looking I found that it's caused by the following code
>>>>> snippet in the 'interrupt' macro in entry_64.S:
>>>>>
>>>>> /*
>>>>> * Save previous stack pointer, optionally switch to interrupt stack.
>>>>> * irq_count is used to check if a CPU is already on an interrupt stack
>>>>> * or not. While this is essentially redundant with preempt_count it is
>>>>> * a little cheaper to use a separate counter in the PDA (short of
>>>>> * moving irq_enter into assembly, which would be too much work)
>>>>> */
>>>>> movq %rsp, %rdi
>>>>> incl PER_CPU_VAR(irq_count)
>>>>> cmovzq PER_CPU_VAR(irq_stack_ptr), %rsp
>>>>> UNWIND_HINT_REGS base=rdi
>>>>> pushq %rdi
>>>>> UNWIND_HINT_REGS indirect=1
>>>>>
>>>>> The problem is that it's changing the stack pointer *before* writing the
>>>>> previous stack pointer (push %rdi). So when unwinding from an NMI which
>>>>> hit between the rsp write and the rdi push, the unwinder tries to access
>>>>> the regs on the previous stack (by reading rdi), but the previous stack
>>>>> pointer isn't there yet, so the access is considered out of bounds.
>>>>
>>>> Ugh, that code. Does this problem go away with this patch applied:
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/entry_ist&id=2231ec7e0bcc1a2bc94a17081511ab54cc6badd1
>>>>
>>>> If so, want to update the patch for new kernels (shouldn't conflict
>>>> with anything except your unwind hints)?
>>>
>>> I don't think that patch will fix it, because it still updates rsp
>>> *before* writing the old rsp on the new stack. So there's still a
>>> window where the "previous stack" pointer is missing.
>>
>> But it's in a register. Is undwarf not able to grok that?
>
> Sorry, I didn't explain it very well. Undwarf can find the regs pointer
> in rdi, it just doesn't trust its value.
>
> See the stack_info.next_sp field, which is set in in_irq_stack():
>
> /*
> * The next stack pointer is the first thing pushed by the entry code
> * after switching to the irq stack.
> */
> info->next_sp = (unsigned long *)*(end - 1);
>
> It's a safety mechanism. The unwinder needs the last word of the irq
> stack page to point to the previous stack. That way it can double check
> that the stack pointer it calculates is within the bounds of either the
> current stack or the previous stack.
>
> In the above code, the previous stack pointer (or next stack pointer,
> depending on your perspective) hasn't been set up before it switches
> stacks. So the unwinder reads an uninitialized value into
> info->next_sp, and compares that with the regs pointer, and then stops
> the unwind because it thinks it went off into the weeds.
>

That should be manageable, though, I think. With my patch applied (and maybe even without it), the only exception to that rule is if regs->sp points just above the top of the IRQ stack and the next instruction is push reg. In that case, the reg is exactly as trustworthy as the normal rule.* Can you teach the unwinding code that this is okay?

* If an NMI hits right there, then it relies on unwinding out of the NMI correctly. But the usual checks that the target stack is a valid stack should prevent us from going off into the weeds regardless.

> --
> Josh