Re: [PATCH v21 18/28] x86/sgx: Add swapping code to the core and SGX driver

From: Jarkko Sakkinen
Date: Wed Aug 07 2019 - 17:22:31 EST


On Wed, Aug 07, 2019 at 06:33:32AM +0000, Jethro Beekman wrote:
> On 2019-07-13 10:07, Jarkko Sakkinen wrote:
> > Because the kernel is untrusted, swapping pages in/out of the Enclave
> > Page Cache (EPC) has specialized requirements:
> >
> > * The kernel cannot directly access EPC memory, i.e. cannot copy data
> > to/from the EPC.
> > * To evict a page from the EPC, the kernel must "prove" to hardware that
> > are no valid TLB entries for said page since a stale TLB entry would
> > allow an attacker to bypass SGX access controls.
> > * When loading a page back into the EPC, hardware must be able to verify
> > the integrity and freshness of the data.
> > * When loading an enclave page, e.g. regular pages and Thread Control
> > Structures (TCS), hardware must be able to associate the page with a
> > Secure Enclave Control Structure (SECS).
> >
> > To satisfy the above requirements, the CPU provides dedicated ENCLS
> > functions to support paging data in/out of the EPC:
> >
> > * EBLOCK: Mark a page as blocked in the EPC Map (EPCM). Attempting
> > to access a blocked page that misses the TLB will fault.
> > * ETRACK: Activate blocking tracking. Hardware verifies that all
> > translations for pages marked as "blocked" have been flushed
> > from the TLB.
> > * EPA: Add version array page to the EPC. As the name suggests, a
> > VA page is an 512-entry array of version numbers that are
> > used to uniquely identify pages evicted from the EPC.
> > * EWB: Write back a page from EPC to memory, e.g. RAM. Software
> > must supply a VA slot, memory to hold the a Paging Crypto
> > Metadata (PCMD) of the page and obviously backing for the
> > evicted page.
> > * ELD{B,U}: Load a page in {un}blocked state from memory to EPC. The
> > driver only uses the ELDU variant as there is no use case
> > for loading a page as "blocked" in a bare metal environment.
> >
> > To top things off, all of the above ENCLS functions are subject to
> > strict concurrency rules, e.g. many operations will #GP fault if two
> > or more operations attempt to access common pages/structures.
> >
> > To put it succinctly, paging in/out of the EPC requires coordinating
> > with the SGX driver where all of an enclave's tracking resides. But,
> > simply shoving all reclaim logic into the driver is not desirable as
> > doing so has unwanted long term implications:
> >
> > * Oversubscribing EPC to KVM guests, i.e. virtualizing SGX in KVM and
> > swapping a guest's EPC pages (without the guest's cooperation) needs
> > the same high level flows for reclaim but has painfully different
> > semantics in the details.
> > * Accounting EPC, i.e. adding an EPC cgroup controller, is desirable
> > as EPC is effectively a specialized memory type and even more scarce
> > than system memory. Providing a single touchpoint for EPC accounting
> > regardless of end consumer greatly simplifies the EPC controller.
> > * Allowing the userspace-facing driver to be built as a loaded module
> > is desirable, e.g. for debug, testing and development. The cgroup
> > infrastructure does not support dependencies on loadable modules.
> > * Separating EPC swapping from the driver once it has been tightly
> > coupled to the driver is non-trivial (speaking from experience).
>
> Some of these points seem stale now.

Thanks for spotting. I'll do a full edit for the commit message and try
to make it more short and punctual.

/Jarkko