Re: [PATCH v3 6/6] x86, mm: Support huge KVA mappings on x86

From: Andrew Morton
Date: Tue Mar 03 2015 - 17:44:30 EST


On Tue, 3 Mar 2015 10:44:24 -0700 Toshi Kani <toshi.kani@xxxxxx> wrote:

> This patch implements huge KVA mapping interfaces on x86.
>
> On x86, MTRRs can override PAT memory types with a 4KB granularity.
> When using a huge page, MTRRs can override the memory type of the
> huge page, which may lead a performance penalty. The processor
> can also behave in an undefined manner if a huge page is mapped to
> a memory range that MTRRs have mapped with multiple different memory
> types. Therefore, the mapping code falls back to use a smaller page
> size toward 4KB when a mapping range is covered by non-WB type of
> MTRRs. The WB type of MTRRs has no affect on the PAT memory types.
>
> pud_set_huge() and pmd_set_huge() call mtrr_type_lookup() to see
> if a given range is covered by MTRRs. MTRR_TYPE_WRBACK indicates
> that the range is either covered by WB or not covered and the MTRR
> default value is set to WB. 0xFF indicates that MTRRs are disabled.
>
> HAVE_ARCH_HUGE_VMAP is selected when X86_64 or X86_32 with X86_PAE
> is set. X86_32 without X86_PAE is not supported since such config
> can unlikey be benefited from this feature, and there was an issue
> found in testing.
>
> ...
>
> +
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> +int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
> +{
> + u8 mtrr;
> +
> + /*
> + * Do not use a huge page when the range is covered by non-WB type
> + * of MTRRs.
> + */
> + mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
> + if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
> + return 0;

It would be good to notify the operator in some way when this happens.
Otherwise the kernel will run more slowly and there's no way of knowing
why. I guess slap a pr_info() in there. Or maybe pr_warn()?

> + prot = pgprot_4k_2_large(prot);
> +
> + set_pte((pte_t *)pud, pfn_pte(
> + (u64)addr >> PAGE_SHIFT,
> + __pgprot(pgprot_val(prot) | _PAGE_PSE)));
> +
> + return 1;
> +}
> +
> +int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
> +{
> + u8 mtrr;
> +
> + /*
> + * Do not use a huge page when the range is covered by non-WB type
> + * of MTRRs.
> + */
> + mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
> + if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
> + return 0;
> +
> + prot = pgprot_4k_2_large(prot);
> +
> + set_pte((pte_t *)pmd, pfn_pte(
> + (u64)addr >> PAGE_SHIFT,
> + __pgprot(pgprot_val(prot) | _PAGE_PSE)));
> +
> + return 1;
> +}
>
> +int pud_clear_huge(pud_t *pud)
> +{
> + if (pud_large(*pud)) {
> + pud_clear(pud);
> + return 1;
> + }
> +
> + return 0;
> +}
> +
> +int pmd_clear_huge(pmd_t *pmd)
> +{
> + if (pmd_large(*pmd)) {
> + pmd_clear(pmd);
> + return 1;
> + }
> +
> + return 0;
> +}

I didn't see anywhere where the return values of these functions are
documented. It's all fairly obvious, but we could help the rearers
a bit.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/