Re: [PATCH v6 2/9] x86, kfence: enable KFENCE for x86

From: Jann Horn
Date: Thu Oct 29 2020 - 22:49:51 EST


On Thu, Oct 29, 2020 at 2:17 PM Marco Elver <elver@xxxxxxxxxx> wrote:
> Add architecture specific implementation details for KFENCE and enable
> KFENCE for the x86 architecture. In particular, this implements the
> required interface in <asm/kfence.h> for setting up the pool and
> providing helper functions for protecting and unprotecting pages.
>
> For x86, we need to ensure that the pool uses 4K pages, which is done
> using the set_memory_4k() helper function.
>
> Reviewed-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
> Co-developed-by: Marco Elver <elver@xxxxxxxxxx>
> Signed-off-by: Marco Elver <elver@xxxxxxxxxx>
> Signed-off-by: Alexander Potapenko <glider@xxxxxxxxxx>
[...]
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
[...]
> @@ -725,6 +726,9 @@ no_context(struct pt_regs *regs, unsigned long error_code,
> if (IS_ENABLED(CONFIG_EFI))
> efi_recover_from_page_fault(address);
>
> + if (kfence_handle_page_fault(address))
> + return;

We can also get to this point due to an attempt to execute a data
page. That's very unlikely (given that the same thing would also crash
if you tried to do it with normal heap memory, and KFENCE allocations
are extremely rare); but we might want to try to avoid handling such
faults as KFENCE faults, since KFENCE will assume that it has resolved
the fault and retry execution of the faulting instruction. Once kernel
protection keys are introduced, those might cause the same kind of
trouble.

So we might want to gate this on a check like "if ((error_code &
X86_PF_PROT) == 0)" (meaning "only handle the fault if the fault was
caused by no page being present", see enum x86_pf_error_code).


Unrelated sidenote: Since we're hooking after exception fixup
handling, the debug-only KFENCE_STRESS_TEST_FAULTS can probably still
cause some behavioral differences through spurious faults in places
like copy_user_enhanced_fast_string (where the exception table entries
are used even if the *kernel* pointer, not the user pointer, causes a
fault). But since KFENCE_STRESS_TEST_FAULTS is exclusively for KFENCE
development, the difference might not matter. And ordering them the
other way around definitely isn't possible, because the kernel relies
on being able to fixup OOB reads. So there probably isn't really
anything we can do better here; it's just something to keep in mind.
Maybe you can add a little warning to the help text for that Kconfig
entry that warns people about this?



> +
> oops:
> /*
> * Oops. The kernel tried to access some bad page. We'll have to
> --
> 2.29.1.341.ge80a0c044ae-goog
>