Re: [RESEND PATCH v6 08/12] x86/fsgsbase/64: Use the per-CPU base as GSBASE at the paranoid_entry

From: Andy Lutomirski
Date: Wed May 01 2019 - 20:33:12 EST




> On May 1, 2019, at 2:04 PM, Bae, Chang Seok <chang.seok.bae@xxxxxxxxx> wrote:
>
>
>> On May 1, 2019, at 13:25, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>
>>
>>
>>> On May 1, 2019, at 1:21 PM, Bae, Chang Seok <chang.seok.bae@xxxxxxxxx> wrote:
>>>
>>>
>>>>> On May 1, 2019, at 11:01, Bae, Chang Seok <chang.seok.bae@xxxxxxxxx> wrote:
>>>>>
>>>>> On May 1, 2019, at 10:40, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>>>>>
>>>>>> On Wed, May 1, 2019 at 6:52 AM Bae, Chang Seok <chang.seok.bae@xxxxxxxxx> wrote:
>>>>>>
>>>>>>
>>>>>>> On Apr 5, 2019, at 06:50, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>>>>>>
>>>>>>> Furthermore, if you folks even want me to review this series, the ptrace tests need to be in place. On inspection of the current code (after the debacle a few releases back), it appears the SETREGSETâs effect depends on the current values in the registers â it does not actually seem to reliably load the whole state. So my confidence will be greatly increased if your series first adds a test that detects that bug (and fails!), then fixes the bug in a tiny little patch, then adds FSGSBASE, and keeps the test working.
>>>>>>>
>>>>>>
>>>>>> I think I need to understand the issue. Appreciate if you can elaborate a little bit.
>>>>>>
>>>>>
>>>>> This patch series gives a particular behavior to PTRACE_SETREGS and
>>>>> PTRACE_POKEUSER. There should be a test case that validates that
>>>>> behavior, including testing the weird cases where gs != 0 and gsbase
>>>>> contains unusual values. Some existing tests might be pretty close to
>>>>> doing what's needed.
>>>>>
>>>>> Beyond that, the current putreg() code does this:
>>>>>
>>>>> case offsetof(struct user_regs_struct,gs_base):
>>>>> /*
>>>>> * Exactly the same here as the %fs handling above.
>>>>> */
>>>>> if (value >= TASK_SIZE_MAX)
>>>>> return -EIO;
>>>>> if (child->thread.gsbase != value)
>>>>> return do_arch_prctl_64(child, ARCH_SET_GS, value);
>>>>> return 0;
>>>>>
>>>>> and do_arch_prctl_64(), in turn, does this:
>>>>>
>>>>> case ARCH_SET_GS: {
>>>>> if (unlikely(arg2 >= TASK_SIZE_MAX))
>>>>> return -EPERM;
>>>>>
>>>>> preempt_disable();
>>>>> /*
>>>>> * ARCH_SET_GS has always overwritten the index
>>>>> * and the base. Zero is the most sensible value
>>>>> * to put in the index, and is the only value that
>>>>> * makes any sense if FSGSBASE is unavailable.
>>>>> */
>>>>> if (task == current) {
>>>>> [not used for ptrace]
>>>>> } else {
>>>>> task->thread.gsindex = 0;
>>>>> x86_gsbase_write_task(task, arg2);
>>>>> }
>>>>>
>>>>> ...
>>>>>
>>>>> So writing the value that was already there to gsbase via putreg()
>>>>> does nothing, but writing a *different* value implicitly clears gs,
>>>>> but writing a different value will clear gs.
>>>>>
>>>>> This behavior is, AFAICT, complete nonsense. It happens to work
>>>>> because usually gdb writes the same value back, and, in any case, gs
>>>>> comes *after* gsbase in user_regs_struct, so gs gets replaced anyway.
>>>>> But I think that this behavior should be fixed up and probably tested.
>>>>> Certainly the behavior should *not* be the same on a fsgsbase kernel,
>>>>> and and the fsgsbase behavior definitely needs a selftest.
>>>>
>>>> Okay, got the point; now crystal clear.
>>>>
>>>> I have my own test case for that though, need to find a very simple and
>>>> acceptable solution.
>>>>
>>>
>>> One solution that I recall, HPA once suggested, is:
>>> Write registers in a reverse order from user_regs_struct, for SETREGS
>>>
>>> Assuming these for clarification, first:
>>> * old and new index != 0
>>> * taking GS as an example though, should be the same with FS
>>>
>>> Then, interesting cases would be something like these, without FSGSBASE:
>>> Case (a), when index only changed to (new index):
>>> (Then, the result after SETREGS would be)
>>> GS = (new index), GSBASE = the base fetched from (new index)
>>> Case (b), when base only changed to (new base):
>>> Case (c), when both are changed:
>>> GS = 0, GSBASE = (new base)
>>>
>>> Now, with FSGSBASE:
>>> Case (a):
>>> GS = (new index), GSBASE = (old base)
>>> Case (b):
>>> GS = (old index), GSBASE = (new base)
>>> Case (c):
>>> GS = (new index), GSBASE = (new base)
>>>
>>> As a reference, today's kernel behavior, without FSGSBASE:
>>> Case (a):
>>> GS = (new index), GSBASE = the base fetched from (new index)
>>> Case (b):
>>> GS = (old index), GSBASE = (old base)
>>> Case (c):
>>> GS = (new index), GSBASE = the base fetched from (new index)
>>>
>>> Now, with that reverse ordering and taking that "GSBASE is important" [1],
>>> it looks like to be working in terms of its base value:
>>> Case (b) and (c) will behave the same as with FSGSBASE
>>> Case (a) still differs between w/ and w/o FSGSBASE.
>>> Well, I'd say this bit comes from the 'new model' vs. the 'leagcy
>>> model'. So, then okay with that. Any thoughts?
>>>
>>>
>>>
>>
>> This seems more complicated than needed. How about we just remove all the magic and make putreg on the base registers never change the selector.
>>
>
> Hmm, just wonder what's benefit in terms of making a non-FSGSBASE system
> behave more similar to one with FSGSBASE (although I would buy that removal).

Simplicity. The current behavior is IMO nuts.

> Well, if we're okay with such divergence, maybe that's it.
>
>> As far as I can tell, the only downside is that, on a non-FSGSBASE kernel, setting only the base if the selector already has a nonzero value wonât work, but I would be quite surprised if this breaks anything.
>
>
>