Re: [PATCH v22 09/24] x86/sgx: Add functions to allocate and free EPC pages

From: Sean Christopherson
Date: Mon Oct 07 2019 - 10:50:16 EST


On Sat, Oct 05, 2019 at 06:44:08PM +0200, Borislav Petkov wrote:
> On Tue, Sep 03, 2019 at 05:26:40PM +0300, Jarkko Sakkinen wrote:
> > +/**
> > + * __sgx_free_page - Free an EPC page
> > + * @page: pointer a previously allocated EPC page
> > + *
> > + * EREMOVE an EPC page and insert it back to the list of free pages.
> > + *
> > + * Return:
> > + * 0 on success
> > + * SGX error code if EREMOVE fails
> > + */
> > +int __sgx_free_page(struct sgx_epc_page *page)
> > +{
> > + struct sgx_epc_section *section = sgx_epc_section(page);
> > + int ret;
>
> Shouldn't you be grabbing the lock here already?
>
> Or nothing can happen to that page from another thread after you
> ENCLS[EREMOVE] it and before it is added to the ->page_list of the
> section?

The caller is responsible for ensuring EREMOVE can be safely executed,
e.g. by holding the enclave's lock.

For many ENCLS leafs, EREMOVE included, the CPU requires exclusive access
to the SGX Enclave Control Structures (SECS)[*] and will signal a #GP if
a different logical CPU is already executing an ENCLS leaf that requires
exclusive SECS access. The SGX subsystem uses a per-enclave mutex to
serialize such ENCLS leafs, among other things.

[*] The SECS is a per-enclave page that resides in the EPC and can only be
directly accessed by the CPU. It's used to track metadata about the
enclave, e.g. number of child pages, base, size, etc...

> > +
> > + ret = __eremove(sgx_epc_addr(page));
> > + if (ret)
> > + return ret;
> > +
> > + spin_lock(&section->lock);
> > + list_add_tail(&page->list, &section->page_list);
> > + section->free_cnt++;
> > + spin_unlock(&section->lock);
> > +
> > + return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(__sgx_free_page);
> > +
> > +/**
> > + * sgx_free_page - Free an EPC page and WARN on failure
> > + * @page: pointer to a previously allocated EPC page
> > + *
> > + * EREMOVE an EPC page and insert it back to the list of free pages, and WARN
> > + * if EREMOVE fails. For use when the call site cannot (or chooses not to)
> > + * handle failure, i.e. the page is leaked on failure.
> > + */
> > +void sgx_free_page(struct sgx_epc_page *page)
> > +{
> > + int ret;
> > +
> > + ret = __sgx_free_page(page);
> > + WARN(ret > 0, "sgx: EREMOVE returned %d (0x%x)", ret, ret);
>
> That will potentially flood dmesg. Why are we even accommodating such
> callers? They either handle the error or they don't get to alloc EPC
> pages. There's also __must_check with which you can enforce the error
> code checking or we simply don't allow not handling failure. Fullstop.

It was deliberately left as a WARN as having multiple stack traces has
been very helpful for triaging and debugging software bugs. I agree that
it can and should be changed to a WARN_ONCE() for upstream.

As for why it exists at all, the WARN will only fire if there is a kernel
and/or CPU bug. In the vast majority of cases, EREMOVE is guaranteed to
be successful (assuming no bugs). In those situations, if EREMOVE fails
there really isn't a sane option other to WARN and leak the page, e.g.
the kernel can't override the CPUs protection to forcefully EREMOVE the
page.

The non-warn variant, __sgx_free_page(), is provided for the (currently)
one case where EREMOVE failure is expected/tolerable (opportunstically
freeing EPC pages when the enclave is killed).