Re: [patch 15/14] x86/dumpstack/64: Speedup in_exception_stack()

From: Andy Lutomirski
Date: Tue Apr 02 2019 - 20:36:07 EST




> On Apr 2, 2019, at 1:29 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
>> On Tue, 2 Apr 2019, Thomas Gleixner wrote:
>> On Tue, 2 Apr 2019, Andy Lutomirski wrote:
>>>> On Apr 2, 2019, at 9:48 AM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>>>>
>>>>>> On Tue, 2 Apr 2019, Josh Poimboeuf wrote:
>>>>>> On Tue, Apr 02, 2019 at 12:19:46PM +0200, Thomas Gleixner wrote:
>>>>>> +/*
>>>>>> + * Array of exception stack page descriptors. If the stack is larger than
>>>>>> + * PAGE_SIZE, all pages covering a particular stack will have the same
>>>>>> + * info.
>>>>>> + */
>>>>>> +static const struct estack_pages estack_pages[ESTACK_PAGES] ____cacheline_aligned = {
>>>>>> + [CONDRANGE(DF)] = ESTACK_PAGE(DOUBLEFAULT_IST, DF),
>>>>>> + [CONDRANGE(NMI)] = ESTACK_PAGE(NMI_IST, NMI),
>>>>>> + [PAGERANGE(DB)] = ESTACK_PAGE(DEBUG_IST, DB),
>>>>>> + [CONDRANGE(MCE)] = ESTACK_PAGE(MCE_IST, MCE),
>>>>>
>>>>> It would be nice if the *_IST macro naming aligned with the struct
>>>>> cea_exception_stacks field naming. Then you could just do, e.g.
>>>>> ESTACKPAGE(DF).
>>>>
>>>> Yes, lemme fix that up.
>>>>
>>>>> Also it's a bit unfortunate that some of the stack size knowledge is
>>>>> hard-coded here, i.e #DB always being > 1 page and non-#DB being
>>>>> sometimes 1 page.
>>>>
>>>> The problem is that there is no way to make this macro maze conditional on
>>>> sizeof(). But my macro foo is rusty.
>>>
>>> How about a much better fix: make the DB stack be the same size as all
>>> the others and just have 4 of them (DB0, DB1, DB2, and DB3. After all,
>>> overflowing from one debug stack into another is just as much of a bug as
>>> overflowing into a different IST stack.
>>
>> That makes sense.
>
> Except that we just have two not four.
>
> It needs some tweaking of the ist_shift stuff in entry_64.S but that's not
> rocket science. Famous last words....
>

The ist_shift mess should probably be in C, but thatâs a big can of worms. That being said, why do we have it at all? Once upon a time, weâd do ICEBP from user mode (or a legit breakpoint), then send a signal and hit a data breakpoint, and weâd recurse. But we donât run user debug handlers on the IST stack at all anymore.

Maybe we can convince ourselves itâs safe?

What we should do is check, on IST return, that weâre not about to return to our own stack. Then we can at least properly panic.