Re: 4.4-rc5 Setting hardware breakpoint in int_ret_from_sys_call causes triple fault/reboot

From: Jeff Merkey
Date: Wed Dec 16 2015 - 19:31:13 EST


On 12/16/15, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> On Dec 16, 2015 3:12 PM, "Jeff Merkey" <linux.mdb@xxxxxxxxx> wrote:
>>
>> Setting a hardware breakpoint at the
>>
>> rex64 sysret
>>
>> instruction at the end of int_ret_from_sys_call causes the system to
>> triple fault
>> and reboot when the breakpoint is triggered. Appears to be related
>> the same problem
>> as the lockup.
>>
>> This function can be stepped over and traced through with the TRAP
>> FLAG set so long as a hardware breakpoint is set somewhere in the
>> function. Otherwise upon exist the system hard hangs. If you break
>> exactly on that instruction -- reboot. If you break a few
>> instructions before it and single step through the call it works. If
>> you step through the call with no breakpoint the system hard hangs.
>> Same behavior as when you try to step from inside an nmi handler.
>> Looks related.
>
> You're probably encountering the user mode RSP when SYSRET happens.
>
> --Andy
>

Hi Andy,

Could be, but I am getting a double fault message with an error code
of 0 that then scrolls off the screen when the triple fault hits. It
flashes too quickly to get the function address -- wish I had a logic
analyzer with an inverse assembler -- would already be there. A
usermode RSP would I assume clear TRAP flag and that does not explain
why it works if I set a breakpoint right above the instruction then
step over it, which I can without the triple fault.

Easy to reproduce, download the mdb debugger for 4.3.3 and apply it to
4.4-rc5, modprobe mdb, echo a > /proc/sysrq_trigger, u
int_ret_from_syscall (scroll til you get to the swapgs then rex64
sysret, set a hardware breakpoint at that address , i.e. b
ffffffff81673ae1 (or whatever address the swapgs instruction is at),
then step through with t a few times (should just return after rex64
sysret since it returns to user space). The set a breakpoint at the
rex64 sysret instruction, b <address>, let it break at the
instruction, then hit g for go and watch the fireworks -- it will try
to print a double fault message then reboot.

I handle the whole user RSP thing, I just return if I see regs set to
user space. This looks like some sort of problem in the exception
handlers.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/