Re: Review of KPTI patchset

From: Thomas Gleixner
Date: Sat Dec 30 2017 - 17:04:07 EST


On Sat, 30 Dec 2017, Mathieu Desnoyers wrote:
> ----- On Dec 30, 2017, at 2:58 PM, Thomas Gleixner tglx@xxxxxxxxxxxxx wrote:
> > /*
> > * Called on fork from arch_dup_mmap(). Just copy the current LDT state,
> > * the new task is not running, so nothing can be installed.
> > */
> > int ldt_dup_context(struct mm_struct *old_mm, struct mm_struct *mm)
> > {
> > struct ldt_struct *new_ldt;
> > int retval = 0;
> >
> > if (!old_mm)
> > return 0;
>
> If old_mm is NULL, free_ldt_pgtables(mm) is not called.

Correct. Why should it be called? Nothing is allocated. You cannot
associate anything to a NULL pointer and you cannot duplicate the LDT
referenced by a NULL pointer, right?

> > mutex_lock(&old_mm->context.lock);
> > if (!old_mm->context.ldt)
>
> If old_mm->context.ldt is NULL, free_ldt_pgtables(mm) is not called.

Again. Why would it be called? There is no page table allocated yet. Care
to look at the calling context or the comment above the function which
explains it?

That's fork: old_mm is the parent mm and mm is the child mm. So how would
the new child mm have that LDT pagetable _before_ it was ever tried to
allocate/map?

> > goto out_unlock;
> >
> > new_ldt = alloc_ldt_struct(old_mm->context.ldt->nr_entries);
> > if (!new_ldt) {
> > retval = -ENOMEM;
>
> On allocation error, free_ldt_pgtables(mm) is not called.

Once more. There is no page table allocated yet.

> > goto out_unlock;
> > }
> >
> > memcpy(new_ldt->entries, old_mm->context.ldt->entries,
> > new_ldt->nr_entries * LDT_ENTRY_SIZE);
> > finalize_ldt_struct(new_ldt);
> >
> > retval = map_ldt_struct(mm, new_ldt, 0);
> > if (retval) {
> > free_ldt_pgtables(mm);
>
> Here, if we fail to map_ldt_struct, then free_ldt_pgtables(mm) is called.
> > free_ldt_struct(new_ldt);
>
> In addition to call free_ldt_struct(), but map_ldt_struct failed... ?

Yes, because we failed to map it and free_ldt_pgtables() cleans up the
leftovers of the map function, which can exit with error and a half
populated page table.

free_ldt_struct() must be called otherwise we leak new_ldt which got
allocated above.

> This lack of symmetry makes me uncomfortable, and it may hint at something
> fishy.

Its not asymetric at all.

The only asymetry is in the error path of write_ldt() which can leak a half
allocated page table. But, that's a nasty one because if there is an
existing LDT mapped, then the pagetable cannot be freed. So yes, it's not
nice, but harmless and needs some thought to fix.

in the ldt_dup() case we can remove it right away because nothing has been
exposed at that point.

> >> + this_cpu_write(cpu_tlbstate.invalidate_other, false);
> >> +}
> >>
> >> Can this be called with preemption enabled ? If so, what happens
> >> if migrated ?
> >
> > No, it can't and if it is then it's a bug and the smp_processor_id() debug
> > code will yell at you.
>
> I thought the whole point about this_cpu_*() was that it could be called
> with preemption enabled, given that it figures out the per-cpu data offset
> using a segment selector prefix. How would smp_processor_id() debug code be
> involved here ?

You're right, the this_cpu_read/write wont. But the this_cpu_ptr()
dereference in one of the invoked functions will.

Granted, it's not obvious and ideally we convert those this_cpu_read/writes
to __this_cpu_read/writes() to get the immediate fail reported on the first
access.

Thanks,

tglx