Re: Simplify load_unaligned_zeropad() (was Re: [GIT PULL] Ceph updates for 5.20-rc1)

From: Peter Zijlstra
Date: Mon Aug 15 2022 - 03:18:49 EST


On Sun, Aug 14, 2022 at 02:14:08PM -0700, Linus Torvalds wrote:

> PeterZ - you've touched both the load_unaligned_zeropad() and the
> exception code last, so let's run this past you, but this really does
> seem to not only fix the code generation issue in fs/dcache.s, it just
> looks simpler too. Comments?

Ha, freshly back from vacation and I barely know what a computer is :-)

> arch/x86/include/asm/extable_fixup_types.h | 2 ++
> arch/x86/include/asm/word-at-a-time.h | 50 +++---------------------------
> arch/x86/mm/extable.c | 30 ++++++++++++++++++
> 3 files changed, 37 insertions(+), 45 deletions(-)
>
> diff --git a/arch/x86/include/asm/extable_fixup_types.h b/arch/x86/include/asm/extable_fixup_types.h
> index 503622627400..b53f1919710b 100644
> --- a/arch/x86/include/asm/extable_fixup_types.h
> +++ b/arch/x86/include/asm/extable_fixup_types.h
> @@ -64,4 +64,6 @@
> #define EX_TYPE_UCOPY_LEN4 (EX_TYPE_UCOPY_LEN | EX_DATA_IMM(4))
> #define EX_TYPE_UCOPY_LEN8 (EX_TYPE_UCOPY_LEN | EX_DATA_IMM(8))

(weird tab stuff there, but that's for another day I suppose)

> +#define EX_TYPE_ZEROPAD 20 /* load ax from dx zero-padded */
> +
> #endif
> diff --git a/arch/x86/include/asm/word-at-a-time.h b/arch/x86/include/asm/word-at-a-time.h
> index 8338b0432b50..4893f1b30dd6 100644
> --- a/arch/x86/include/asm/word-at-a-time.h
> +++ b/arch/x86/include/asm/word-at-a-time.h
> @@ -77,58 +77,18 @@ static inline unsigned long find_zero(unsigned long mask)
> * and the next page not being mapped, take the exception and
> * return zeroes in the non-existing part.
> */
> static inline unsigned long load_unaligned_zeropad(const void *addr)
> {
> unsigned long ret;
>
> + asm volatile(
> + "1: mov (%[addr]), %[ret]\n"
> "2:\n"
> + _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_ZEROPAD)
> + : [ret] "=a" (ret)
> + : [addr] "d" (addr));
>
> return ret;
> }
>
> #endif /* _ASM_WORD_AT_A_TIME_H */
> diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
> index 331310c29349..58c79077a496 100644
> --- a/arch/x86/mm/extable.c
> +++ b/arch/x86/mm/extable.c
> @@ -41,6 +41,34 @@ static bool ex_handler_default(const struct exception_table_entry *e,
> return true;
> }
>
> +/*
> + * This is the *very* rare case where we do a "load_unaligned_zeropad()"
> + * and it's a page crosser into a non-existent page.
> + *
> + * This happens when we optimistically load a pathname a word-at-a-time
> + * and the name is less than the full word and the next page is not
> + * mapped. Typically that only happens for CONFIG_DEBUG_PAGEALLOC.
> + *
> + * NOTE! The load is always of the form "mov (%edx),%eax" to make the
> + * fixup simple.

So obviously we could use _ASM_EXTABLE_TYPE_REG together with something
like: "mov (%[reg]), %[reg]" to not depend on these fixed registers, but
yeah, that doesn't seem needed. Code-gen is fine as is.

> + */
> +static bool ex_handler_zeropad(const struct exception_table_entry *e,
> + struct pt_regs *regs,
> + unsigned long fault_addr)
> +{
> + const unsigned long mask = sizeof(long) - 1;
> + unsigned long offset, addr;
> +
> + offset = regs->dx & mask;
> + addr = regs->dx & ~mask;
> + if (fault_addr != addr + sizeof(long))
> + return false;
> +
> + regs->ax = *(unsigned long *)addr >> (offset * 8);


> + regs->ip = ex_fixup_addr(e);
> + return true;

I think the convention here is to do:

return ex_handler_default(e, regs);

instead, that ensures there a bit of common post code.

> +}
> +
> static bool ex_handler_fault(const struct exception_table_entry *fixup,
> struct pt_regs *regs, int trapnr)
> {

But yeah, looks good to me.