Re: [PATCH 22/35] x86/mm: Prevent VM_WRITE shadow stacks

From: Dave Hansen
Date: Fri Feb 11 2022 - 17:19:57 EST


On 1/30/22 13:18, Rick Edgecombe wrote:
> Shadow stack accesses are writes from handle_mm_fault() perspective. So to
> generate the correct PTE, maybe_mkwrite() will rely on the presence of
> VM_SHADOW_STACK or VM_WRITE in the vma.
>
> In future patches, when VM_SHADOW_STACK is actually creatable by
> userspace, a problem could happen if a user calls
> mprotect( , , PROT_WRITE) on VM_SHADOW_STACK shadow stack memory. The code
> would then be confused in the event of shadow stack accesses, and create a
> writable PTE for a shadow stack access. Then the process would fault in a
> loop.
>
> Prevent this from happening by blocking this kind of memory (VM_WRITE and
> VM_SHADOW_STACK) from being created, instead of complicating the fault
> handler logic to handle it.
>
> Add an x86 arch_validate_flags() implementation to handle the check.
> Rename the uapi/asm/mman.h header guard to be able to use it for
> arch/x86/include/asm/mman.h where the arch_validate_flags() will be.

It would be great if this also said:

There is an existing arch_validate_flags() hook for mmap() and
mprotect() which allows architectures to reject unwanted
->vm_flags combinations. Add an implementation for x86.

That's somewhat implied from what is there already, but making it more
clear would be nice. There's a much higher bar to add a new arch hook
than to just implement an existing one.


> diff --git a/arch/x86/include/asm/mman.h b/arch/x86/include/asm/mman.h
> new file mode 100644
> index 000000000000..b44fe31deb3a
> --- /dev/null
> +++ b/arch/x86/include/asm/mman.h
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_X86_MMAN_H
> +#define _ASM_X86_MMAN_H
> +
> +#include <linux/mm.h>
> +#include <uapi/asm/mman.h>
> +
> +#ifdef CONFIG_X86_SHADOW_STACK
> +static inline bool arch_validate_flags(unsigned long vm_flags)
> +{
> + if ((vm_flags & VM_SHADOW_STACK) && (vm_flags & VM_WRITE))
> + return false;
> +
> + return true;
> +}

The design decision here seems to be that VM_SHADOW_STACK is itself a
pseudo-VM_WRITE flag. Like you said: "Shadow stack accesses are writes
from handle_mm_fault()".

Very early on, this series seems to have made the decision that shadow
stacks are writable and need lots of write handling behavior, *BUT*
shouldn't have VM_WRITE set. As a whole, that seems odd.

The alternative would be *requiring* VM_WRITE and VM_SHADOW_STACK be set
together. I guess the downside is that pte_mkwrite() would need to be
made to work on shadow stack PTEs.

That particular design decision was never discussed. I think it has a
really big impact on the rest of the series. What do you think? Was it
a good idea? Or would the alternative be more complicated than what you
have now?