Re: [PATCH] x86/mm/ptdump: Fix soft lockup in page table walker.

From: Dmitry Vyukov
Date: Fri Feb 10 2017 - 08:28:45 EST


On Fri, Feb 10, 2017 at 1:15 PM, Andrey Ryabinin
<aryabinin@xxxxxxxxxxxxx> wrote:
>
>
> On 02/10/2017 02:18 PM, Thomas Gleixner wrote:
>> On Fri, 10 Feb 2017, Dmitry Vyukov wrote:
>>> This is the right thing to do per se, but I am concerned that now
>>> people will just suffers from slow boot (it can take literally
>>> minutes) and will not realize the root cause nor that it's fixable
>>> (e.g. with rodata=n) and will probably just blame KASAN for slowness.
>>>
>>> Could we default this rodata check to n under KASAN? Or at least print
>>> some explanatory warning message before doing marking rodata (it
>>> should be printed right before "hang", so if you stare at it for a
>>> minute during each boot you realize that it may be related)? Or
>>> something along these lines. FWIW in my builds I just always disable
>>> the check.
>>
>> That certainly makes sense and we emit such warnings in other places
>> already (lockdep, trace_printk ...)
>>
>
> Agreed, but perhaps it would be better to make this code faster for KASAN=y?
> The main problem here is that we have many pgd entries containing kasan_zero_pud values
> and ptdump walker checks kasan_zero_pud many times.
> Instead, we could check it only once and skip further kasan_zero_pud's.
>
> I can't say I like this hack very much, but it wins me almost 20 seconds of boot time.
> Any objections?


Now I remember that we already discussed it in this thread:
https://lkml.org/lkml/2016/11/8/775

Andrey, you proposed:

"I didn't look at any code, but we probably could can remember last
visited pgd and skip next pgd if it's the same as previous."

Do you still think it's a good idea?
Walking the same pgd multiple times does not make sense (right?). And
it could probably speedup non-kasan builds to some degree in some
contexts. And the code will be free of additional ifdefs.



> diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
> index 8aa6bea..0fbae1d 100644
> --- a/arch/x86/mm/dump_pagetables.c
> +++ b/arch/x86/mm/dump_pagetables.c
> @@ -13,6 +13,7 @@
> */
>
> #include <linux/debugfs.h>
> +#include <linux/kasan.h>
> #include <linux/mm.h>
> #include <linux/init.h>
> #include <linux/sched.h>
> @@ -121,6 +122,30 @@ static struct addr_marker address_markers[] = {
> seq_printf(m, fmt, ##args); \
> })
>
> +
> +#ifdef CONFIG_KASAN
> +static bool kasan_pgd_checked(pgd_t pgd, bool checkwx)
> +{
> + static bool kasan_zero_pgd_checked = false;
> + pgd_t kasan_zero_pgd = __pgd(__pa(kasan_zero_pud) | _PAGE_TABLE);
> +
> + if (!checkwx)
> + return false;
> +
> + if (pgd_val(pgd) == pgd_val(kasan_zero_pgd)) {
> + if (kasan_zero_pgd_checked)
> + return true;
> + kasan_zero_pgd_checked = true;
> + }
> + return false;
> +}
> +#else
> +static inline bool kasan_pgd_checked(pgd_t pgd, bool checkwx)
> +{
> + return false;
> +}
> +#endif
> +
> /*
> * Print a readable form of a pgprot_t to the seq_file
> */
> @@ -396,7 +421,8 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
>
> for (i = 0; i < PTRS_PER_PGD; i++) {
> st.current_address = normalize_addr(i * PGD_LEVEL_MULT);
> - if (!pgd_none(*start) && !is_hypervisor_range(i)) {
> + if (!pgd_none(*start) && !is_hypervisor_range(i) &&
> + !kasan_pgd_checked(*start, checkwx)) {
> if (pgd_large(*start) || !pgd_present(*start)) {
> prot = pgd_flags(*start);
> note_page(m, &st, __pgprot(prot), 1);
>