Re: [RFC] x86/mm/KASLR: Remap GDTs at fixed location

From: Andy Lutomirski
Date: Sat Jan 07 2017 - 11:02:55 EST


On Fri, Jan 6, 2017 at 11:35 PM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
> * Thomas Garnier <thgarnie@xxxxxxxxxx> wrote:
>
>> > No, and I had the way this worked on 64-bit wrong. LTR requires an
>> > available TSS and changes it to busy. So here are my thoughts on how
>> > this should work:
>> >
>> > Let's get rid of any connection between this code and KASLR. Every
>> > time KASLR makes something work differently, a kitten turns all
>> > SchrÃdinger on us. This is moving the GDT to the fixmap, plain and
>> > simple. For now, make it one page per CPU and don't worry about the
>> > GDT limit.
>>
>> I am all for this change but that's more significant.
>>
>> Ingo: What do you think about that?
>
> I agree with Andy: as I alluded to earlier as well this should be an unconditional
> change (tested properly, etc.) that robustifies the GDT mapping for everyone. That
> KASLR kernels improve too is a happy side effect!
>
>> > On 32-bit, we're going to have to make the fixmap GDT be read-write because
>> > making it read-only will break double-fault handling.
>> >
>> > On 64-bit, we can use your trick of temporarily mapping the GDT read-write
>> > every time we load TR, which should happen very rarely. Alternatively, we can
>> > reload the *GDT* every time we reload TR, which should be comparably slow.
>> > This is going to regress performance in the extremely rare case where KVM
>> > exits to a process that uses ioperm() (I think), but I doubt anyone cares. Or
>> > maybe we could arrange to never reload TR when GDT points at the fixmap by
>> > having KVM set the host GDT to the direct version and letting KVM's code to
>> > reload the GDT switch to the fixmap copy.
>
> Please check whether the LTR write generates a page fault to a RO PTE even if the
> busy bit is already set. LTR is pretty slow which suggests that it's microcode,
> and microcode is usually not sloppy about such things: i.e. LTR would only
> generate an unconditional write if there's a compatibility dependency on it. But I
> could easily be wrong ...

The SDM says:

IF segment descriptor is not for an available TSS
THEN #GP(segment selector); FI;

so I think it's #GP not #PF.

>
>> > If we need a quirk to keep the fixmap copy read-write, so be it.
>> >
>> > None of this should depend on KASLR. IMO it should happen unconditionally.
>>
>> I looked back at the fixmap, and I can see a way it could be done
>> (using NR_CPUS) like the other fixmap ranges. It would limit the
>> number of cpus to 512 (there is 2M memory left on fixmap on the
>> default configuration). That's if we never add any other fixmap on
>> x64. I don't know if it is an acceptable number and if the fixmap
>> region could be increased. (128 if we do your kvm trick, of course).
>>
>> Ingo: What do you think?
>
> I think we should scale the fixmap size flexibly with NR_CPUs on 64-bit, and we
> should limit CPUs on 32-bit to a reasonable value.

Unless the headers are even more tangled than usual, I think we could
just handle this exactly. The top and bottom of the fixmap are known
exactly at compile time, so we should just be able to make the next mm
range adjust its start point accordingly. The main issue right now
seems to be that MODULES_END is hard-coded.

For 32-bit, the task gate in #DF is going to be a show-stopper for an
RO fixmap, I think. I'm still slightly in favor of enabling the code
but with an RW mapping on 32-bit, though. And anyone running hundreds
to thousands of CPUs on 32-bit is nuts anyway.

--Andy