Re: [PATCH v38 13/24] x86/sgx: Add SGX_IOC_ENCLAVE_ADD_PAGES

From: Jarkko Sakkinen
Date: Mon Sep 21 2020 - 17:24:23 EST


On Mon, Sep 21, 2020 at 12:57:39PM -0700, Sean Christopherson wrote:
> On Mon, Sep 21, 2020 at 10:44:19PM +0300, Jarkko Sakkinen wrote:
> > On Mon, Sep 21, 2020 at 09:49:48PM +0300, Jarkko Sakkinen wrote:
> > > To have understandable semantics you have to map error codes to
> > > conditions rather than opcodes. -EIO means loss of enclave in the event
> > > of EPC gone invalid. Enclave is already lost, that is the reason why we
> > > deinitialize the kernel data structures.
> > >
> > > EADD must have a different error code because nothing is actually lost
> > > but the failure conditions are triggered outside. -EFAULT would be
> > > probably the most reasonable choice for that.
> >
> > Now that I did all the changes discussed and then I remember why EADD
> > and EEXTEND had a common error code, and common behaviour. Obviously EADD
> > can also fail because of EPC reset because it depends on a valid SECS
> > page.
> >
> > If we cannot distinct from EADD caused by EPC loss and EADD caused by
> > problems with the source, it should have the same error code, and also
> > the enclave should be deinitialized, whenver this happens.
>
> Hmm, on SGX2 hardware the kernel can precisely and accurately identify loss
> of EPC, or at least "problem with the EPCM", as such a condition will be a
> page fault with PFEC.SGX=1.

True.

> But getting that info back to the ENCLS invocation would require adding a
> new exception fixup handler in order to "return" the error code. Given that
> this is the only case where that level of precision makes a difference, I
> think it's ok to just kill the enclave on any EADD failure. Practically
> speaking I highly doubt the overzealous killing will impact userspace, I
> would imagine any SGX runtime would treat -EFAULT as fatal anyways.

This is true. We could do this if wanted. Most of the time bad source
address would require either badly behaving run-time or doing it on
purpose. For the former case, since it is badly behaving by definition,
this granularity would not improve situation. For the latter case, we do
not want to do any active support.

I guess I'll still have to correct the documentation just a bit.

> Side topic, this does invalidate my argument for not killing the enclave on
> EADD failure. If EADD fails due to loss of EPC, it's theoretically possible
> userspace could get stuck in an infinite loop if it does a naive retry on
> -EIO or whatever.

I don't think we care about that unless it renders out any legit
correctly working feature for a run-time.

/Jarkko