[PATCH 03/25] x86/sgx: Support VMA permissions exceeding enclave permissions

From: Reinette Chatre
Date: Wed Dec 01 2021 - 14:24:52 EST


=== Summary ===

An SGX VMA can only be created if its permissions are the same or
weaker than the Enclave Page Cache Map (EPCM) permissions. After VMA
creation this rule continues to be enforced by the page fault handler.

With SGX2 the EPCM permissions of a page can change after VMA
creation resulting in the VMA exceeding the EPCM permissions and the
page fault handler incorrectly blocking access.

Enable the VMA's pages to remain accessible while ensuring that
the page table entries are installed to match the EPCM permissions
without exceeding the VMA permissions.

=== Full Changelog ===

An SGX enclave is an area of memory where parts of an application
can reside. First an enclave is created and loaded (from
non-enclave memory) with the code and data of an application,
then user space can map (mmap()) the enclave memory to
be able to enter the enclave at its defined entry points for
execution within it.

The hardware maintains a secure structure, the Enclave Page Cache Map
(EPCM), that tracks the contents of the enclave. Of interest here is
its tracking of the enclave page permissions. When a page is loaded
into the enclave its permissions are specified and recorded in the
EPCM. In parallel the OS maintains the page table permissions and
the rule is that page table permissions are never allowed to exceed
EPCM permissions.

A new mapping (mmap()) of enclave memory can only succeed if the
mapping has the same or weaker permissions than the permissions that
were vetted during enclave creation. This is enforced by
sgx_encl_may_map() that is called on the mmap() as well as mprotect()
paths. This permission verification remains.

One feature of SGX2 is to support the modification of enclave page
permissions after enclave creation. Enclave pages may thus already be
mapped at the time their enclave permissions are changed resulting
in the VMA's permissions potentially exceeding the enclave page
permissions.

Enable permissions of existing VMAs to exceed enclave page permissions
in preparation for dynamic enclave page permission changes.
New VMAs that attempt to exceed enclave page permissions continue to be
unsupported.

Reasons why permissions of existing VMAs are allowed to exceed enclave
page permissions instead of dynamically changing VMA permissions when
enclave page permissions change are:
1) Changing VMA permissions involve splitting VMAs which is an operation
that can fail. Additionally the actual changing of page permissions
of a range of pages could also fail on any of the pages involved.
Handling these error cases causes problems. For example, if an
enclave page permission change fails and the VMA has already been
split then it is not possible to undo the VMA split nor possible to
undo the enclave page permission changes that did succeed before the
failure.
2) The OS has little insight into the user space where EPCM permissions
are controlled from. For example, a RW page may be made RO just
before it is made RX and splitting the VMAs while the VMAs may change
soon is unnecessary.

Remove the extra permission check called on a page fault
(vm_operations_struct->fault) or during debugging
(vm_operations_struct->access) when loading the enclave page from swap
that ensures that the VMA permissions do not exceed the enclave
permissions. Since a VMA could only exist if it passed the original
permission checks during mmap() and a VMA may indeed exceed the page
permissions this extra permission check is no longer appropriate.

With the permission check removed, ensure that page table entries do
not blindly inherit the VMA permissions but instead the permissions
that the VMA and enclave agree on. PTEs for writable pages (from VMA and
enclave perspective) are installed with the writable bit set, reducing
the need for this additional flow to the permission mismatch cases
handled next.

Signed-off-by: Reinette Chatre <reinette.chatre@xxxxxxxxx>
---
arch/x86/kernel/cpu/sgx/encl.c | 38 ++++++++++++++++++----------------
1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 001808e3901c..20e97d3abdce 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -91,10 +91,8 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
}

static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
- unsigned long addr,
- unsigned long vm_flags)
+ unsigned long addr)
{
- unsigned long vm_prot_bits = vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
struct sgx_epc_page *epc_page;
struct sgx_encl_page *entry;

@@ -102,14 +100,6 @@ static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
if (!entry)
return ERR_PTR(-EFAULT);

- /*
- * Verify that the faulted page has equal or higher build time
- * permissions than the VMA permissions (i.e. the subset of {VM_READ,
- * VM_WRITE, VM_EXECUTE} in vma->vm_flags).
- */
- if ((entry->vm_max_prot_bits & vm_prot_bits) != vm_prot_bits)
- return ERR_PTR(-EFAULT);
-
/* Entry successfully located. */
if (entry->epc_page) {
if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
@@ -138,7 +128,9 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
{
unsigned long addr = (unsigned long)vmf->address;
struct vm_area_struct *vma = vmf->vma;
+ unsigned long page_prot_bits;
struct sgx_encl_page *entry;
+ unsigned long vm_prot_bits;
unsigned long phys_addr;
struct sgx_encl *encl;
vm_fault_t ret;
@@ -155,7 +147,7 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)

mutex_lock(&encl->lock);

- entry = sgx_encl_load_page(encl, addr, vma->vm_flags);
+ entry = sgx_encl_load_page(encl, addr);
if (IS_ERR(entry)) {
mutex_unlock(&encl->lock);

@@ -167,7 +159,19 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)

phys_addr = sgx_get_epc_phys_addr(entry->epc_page);

- ret = vmf_insert_pfn(vma, addr, PFN_DOWN(phys_addr));
+ /*
+ * Insert PTE to match the EPCM page permissions ensured to not
+ * exceed the VMA permissions.
+ */
+ vm_prot_bits = vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
+ page_prot_bits = entry->vm_max_prot_bits & vm_prot_bits;
+ /*
+ * Add VM_SHARED so that PTE is made writable right away if VMA
+ * and EPCM are writable (no COW in SGX).
+ */
+ page_prot_bits |= (vma->vm_flags & VM_SHARED);
+ ret = vmf_insert_pfn_prot(vma, addr, PFN_DOWN(phys_addr),
+ vm_get_page_prot(page_prot_bits));
if (ret != VM_FAULT_NOPAGE) {
mutex_unlock(&encl->lock);

@@ -295,15 +299,14 @@ static int sgx_encl_debug_write(struct sgx_encl *encl, struct sgx_encl_page *pag
* Load an enclave page to EPC if required, and take encl->lock.
*/
static struct sgx_encl_page *sgx_encl_reserve_page(struct sgx_encl *encl,
- unsigned long addr,
- unsigned long vm_flags)
+ unsigned long addr)
{
struct sgx_encl_page *entry;

for ( ; ; ) {
mutex_lock(&encl->lock);

- entry = sgx_encl_load_page(encl, addr, vm_flags);
+ entry = sgx_encl_load_page(encl, addr);
if (PTR_ERR(entry) != -EBUSY)
break;

@@ -339,8 +342,7 @@ static int sgx_vma_access(struct vm_area_struct *vma, unsigned long addr,
return -EFAULT;

for (i = 0; i < len; i += cnt) {
- entry = sgx_encl_reserve_page(encl, (addr + i) & PAGE_MASK,
- vma->vm_flags);
+ entry = sgx_encl_reserve_page(encl, (addr + i) & PAGE_MASK);
if (IS_ERR(entry)) {
ret = PTR_ERR(entry);
break;
--
2.25.1