Re: [PATCH v39 13/24] x86/sgx: Add SGX_IOC_ENCLAVE_ADD_PAGES

From: Dave Hansen
Date: Mon Oct 19 2020 - 16:48:43 EST


On 10/17/20 10:03 PM, Jarkko Sakkinen wrote:
> On Fri, Oct 16, 2020 at 02:25:50PM -0700, Dave Hansen wrote:
>>> +/**
>>> + * struct sgx_enclave_add_pages - parameter structure for the
>>> + * %SGX_IOC_ENCLAVE_ADD_PAGE ioctl
>>> + * @src: start address for the page data
>>> + * @offset: starting page offset
>>
>> Is this the offset *within* the page? Might be nice to say that.
>
> It's the offset in the enclave address range where page is to be added.

Yikes, comment improvement needed, badly.

>>> +static struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl,
>>> + unsigned long offset,
>>> + u64 secinfo_flags)
>>> +{
>>> + struct sgx_encl_page *encl_page;
>>> + unsigned long prot;
>>> +
>>> + encl_page = kzalloc(sizeof(*encl_page), GFP_KERNEL);
>>> + if (!encl_page)
>>> + return ERR_PTR(-ENOMEM);
>>> +
>>> + encl_page->desc = encl->base + offset;
>>> + encl_page->encl = encl;
>>
>> Somewhere, we need an explanation of why we have 'sgx_epc_page' and
>> 'sgx_encl_page'. I think they're 1:1 at least after
>> sgx_encl_page_alloc(), so I'm wondering why we need two.
>
> You need sgx_encl_page to hold data that exists whether or not there is
> an associated EPC page.

Except they're currently tightly bound:

> encl_page = sgx_encl_page_alloc(encl, offset, secinfo->flags);
> if (IS_ERR(encl_page))
> return PTR_ERR(encl_page);
>
> - epc_page = __sgx_alloc_epc_page();
> + epc_page = sgx_alloc_epc_page(encl_page, true);
> if (IS_ERR(epc_page)) {
> kfree(encl_page);
> return PTR_ERR(epc_page);
> }

So, is this because 'sgx_encl_page' continues to exist even if
'sgx_epc_page' is reclaimed?

> Essentially sgx_encl_page contains the data needed for a virtual page,
> and sgx_epc_page what is needed to represent physical page.

So, grumble grumble, that's horribly inefficient for sparse mappings.
There's a reason VMAs cover ranges instead of being allocated
per-virtual-page.

> None of the data contained in sgx_encl_page make sense for sgx_epc_page.
> They don't contain intersecting or redundant data.

Yeah, except they point to each other, so if one isn't necessary, we can
get rid of that pointer.

>>> +static int __sgx_encl_add_page(struct sgx_encl *encl,
>>> + struct sgx_encl_page *encl_page,
>>> + struct sgx_epc_page *epc_page,
>>> + struct sgx_secinfo *secinfo, unsigned long src)
>>> +{
>>> + struct sgx_pageinfo pginfo;
>>> + struct vm_area_struct *vma;
>>> + struct page *src_page;
>>> + int ret;
>>> +
>>> + /* Deny noexec. */
>>> + vma = find_vma(current->mm, src);
>>> + if (!vma)
>>> + return -EFAULT;
>>> +
>>> + if (!(vma->vm_flags & VM_MAYEXEC))
>>> + return -EACCES;
>>> +
>>> + ret = get_user_pages(src, 1, 0, &src_page, NULL);
>>> + if (ret < 1)
>>> + return -EFAULT;
>>> +
>>> + pginfo.secs = (unsigned long)sgx_get_epc_addr(encl->secs.epc_page);
>>> + pginfo.addr = SGX_ENCL_PAGE_ADDR(encl_page);
>>> + pginfo.metadata = (unsigned long)secinfo;
>>> + pginfo.contents = (unsigned long)kmap_atomic(src_page);
>>> +
>>> + ret = __eadd(&pginfo, sgx_get_epc_addr(epc_page));
>>
>> Could you convince me that EADD is not going to fault and make the
>> kmap_atomic() mad?
>
> It can legitly fail in the case when power cycle happens.
>
> That's why the inline assembly catches faults and return an error code.
> Thhis code has been field tested a lot. I have fairly good trust on
> it.

OK, so it can fault, but not *sleep*.

Can you comment it to that effect, please?

>>> + if (ret) {
>>> + if (encls_failed(ret))
>>> + ENCLS_WARN(ret, "EEXTEND");
>>> + return -EIO;
>>
>> How frequent should we expect these to be? Can users cause them? You
>> should *proably* call it ENCLS_WARN_ONCE() if it's implemented that way.
>
> If power cycle happens.

So, we get one warning per power cycle? Practically, do you mean a
suspend/resume cycle, or is this more like hibernation-to-disk-resume?

In any case, if this is normal system operation (which closing my laptop
lid qualifies as), it should produce zero warnings.


>>> +static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
>>> + unsigned long offset, struct sgx_secinfo *secinfo,
>>> + unsigned long flags)
>>> +{
>>> + struct sgx_encl_page *encl_page;
>>> + struct sgx_epc_page *epc_page;
>>> + int ret;
>>> +
>>> + encl_page = sgx_encl_page_alloc(encl, offset, secinfo->flags);
>>> + if (IS_ERR(encl_page))
>>> + return PTR_ERR(encl_page);
>>> +
>>> + epc_page = __sgx_alloc_epc_page();
>>> + if (IS_ERR(epc_page)) {
>>> + kfree(encl_page);
>>> + return PTR_ERR(epc_page);
>>> + }
>>
>> Looking at these, I'm forgetting why we need to both allocate an
>> encl_page and an epc_page. Commends might remind me. So would better
>> names.
>
> Should the struct names be renamed?
>
> Like sgx_phys_epc_page and sgx_virt_epc_page?

"epc" is additional acronym nonsense and redundant with "sgx" and "page"
anyway.

I'd probably call then 'sgx_phys_page' and 'sgx_virt_slot' or something.

>>> + mmap_read_lock(current->mm);
>>> + mutex_lock(&encl->lock);
>>> +
>>> + /*
>>> + * Insert prior to EADD in case of OOM.
>>
>> I wouldn't say OOM. Maybe:
>>
>> xa_insert() and EADD can both fail. But xa_insert() is easier
>> to unwind so do it first.
>>
>>> EADD modifies MRENCLAVE, i.e.
>>
>> What is MRENCLAVE?
>
> The measurement stored in SECS. I'm wondering with xarray, is it
> possible to preallocate entry without inserting anything?

Let's use plain english here. I don't care what the implementation
does, I just care about what it means to the kernel.

> Then we could get rid of this unwind and also would not need to
> take encl->lock in sgx_encl_may_map().

There was for radix trees, iirc.

>>> + * can't be gracefully unwound, while failure on EADD/EXTEND is limited
>>> + * to userspace errors (or kernel/hardware bugs).
>>> + */
>>> + ret = xa_insert(&encl->page_array, PFN_DOWN(encl_page->desc),
>>> + encl_page, GFP_KERNEL);
>>> + if (ret)
>>> + goto err_out_unlock;
>>> +
>>> + ret = __sgx_encl_add_page(encl, encl_page, epc_page, secinfo,
>>> + src);
>>> + if (ret)
>>> + goto err_out;
>>> +
>>> + /*
>>> + * Complete the "add" before doing the "extend" so that the "add"
>>> + * isn't in a half-baked state in the extremely unlikely scenario
>>> + * the enclave will be destroyed in response to EEXTEND failure.
>>> + */
>>> + encl_page->encl = encl;
>>> + encl_page->epc_page = epc_page;
>>> + encl->secs_child_cnt++;
>>> +
>>> + if (flags & SGX_PAGE_MEASURE) {
>>> + ret = __sgx_encl_extend(encl, epc_page);
>>> + if (ret)
>>> + goto err_out;
>>> + }
>>
>> Why would we never *not* measure an added page?
>
> You might add Thread Control Structure pages without measuring them or
> data area. There are reasons for the user space not to have everything
> measured.

This is also good comment fodder.

>>> +static long sgx_ioc_enclave_add_pages(struct sgx_encl *encl, void __user *arg)
>>> +{
>>> + struct sgx_enclave_add_pages addp;
>>> + struct sgx_secinfo secinfo;
>>> + unsigned long c;
>>> + int ret;
>>> +
>>> + if ((atomic_read(&encl->flags) & SGX_ENCL_INITIALIZED) ||
>>> + !(atomic_read(&encl->flags) & SGX_ENCL_CREATED))
>>> + return -EINVAL;
>>
>> There should to be a nice state machine documented somewhere. Is ther?
>
> So should I document to encl.h where they are declared to start with?

I think it's better placed in the Documentation/.

>>> + if (copy_from_user(&addp, arg, sizeof(addp)))
>>> + return -EFAULT;
>>> +
>>> + if (!IS_ALIGNED(addp.offset, PAGE_SIZE) ||
>>> + !IS_ALIGNED(addp.src, PAGE_SIZE))
>>> + return -EINVAL;
>>> +
>>> + if (!(access_ok(addp.src, PAGE_SIZE)))
>>> + return -EFAULT;
>>
>> This worries me. You're doing an access_ok() check on addp.src because
>> you evidently don't trust it. But, below, it looks to be accessed
>> directly with an offset, bound by addp.length, which I think can be
>>> PAGE_SIZE.
>>
>> I'd feel a lot better if addp.src's value was being passed around as a
>> __user pointer.
>
> I'm not sure if that call is even needed. Each page is pinned with
> get_user_pages(). AFAIK, it should be enough. This must be legacy cruft.

get_user_pages() and access_ok() do *very* different things. Even if
the pages are pinned, you might still be tricked into referencing off
the end of the page, or up into the kernel address space.

>>> + if (addp.length & (PAGE_SIZE - 1))
>>> + return -EINVAL;
>>> +
>>> + if (addp.offset + addp.length - PAGE_SIZE >= encl->size)
>>> + return -EINVAL;
>>> +
>>> + if (copy_from_user(&secinfo, (void __user *)addp.secinfo,
>>> + sizeof(secinfo)))
>>> + return -EFAULT;
>>> +
>>> + if (sgx_validate_secinfo(&secinfo))
>>> + return -EINVAL;
>>> +
>>> + for (c = 0 ; c < addp.length; c += PAGE_SIZE) {
>>> + if (signal_pending(current)) {
>>> + if (!c)
>>> + ret = -ERESTARTSYS;
>>> +
>>> + break;
>>> + }
>>> +
>>> + if (c == SGX_MAX_ADD_PAGES_LENGTH)
>>> + break;
>>> +
>>> + if (need_resched())
>>> + cond_resched();
>>> +
>>> + ret = sgx_encl_add_page(encl, addp.src + c, addp.offset + c,
>>> + &secinfo, addp.flags);
>>
>> Yeah... Don't we need to do another access_ok() check here, if we
>> needed one above since we are moving away from addrp.src?
>
> I don't think so because the page is pinned with get_user_pages().

No, get_user_pages() is orthogonal.

Looking at this again, you _might_ be OK since you validated addp.length
against encl->size. But, it's all very convoluted and doesn't look very
organized or obviously right.

So, this begs the question: What, exactly are the guarantees you are
expecting out of get_user_pages() here?

I also think it's an absolute requirement that if you're passing around
userspace pointers that you tag them as __user, not pass around as
unsigned longs.