Re: [PATCH v6 09/12] x86/sgx: Restructure top-level EPC reclaim function

From: Jarkko Sakkinen
Date: Sat Jan 13 2024 - 16:05:08 EST


On Fri Jan 12, 2024 at 7:07 PM EET, Haitao Huang wrote:
> On Sun, 17 Dec 2023 19:44:56 -0600, Huang, Kai <kai.huang@xxxxxxxxx> wrote:
>
> >
> >> >
> >> > The point is, with or w/o this patch, you can only reclaim 16 EPC
> >> pages
> >> > in one
> >> > function call (as you have said you are going to remove
> >> > SGX_NR_TO_SCAN_MAX,
> >> > which is a cipher to both of us). The only difference I can see is,
> >> > with this
> >> > patch, you can have multiple calls of "isolate" and then call the
> >> > "do_reclaim"
> >> > once.
> >> >
> >> > But what's the good of having the "isolate" if the "do_reclaim" can
> >> only
> >> > reclaim
> >> > 16 pages anyway?
> >> >
> >> > Back to my last reply, are you afraid of any LRU has less than 16
> >> pages
> >> > to
> >> > "isolate", therefore you need to loop LRUs of descendants to get 16?
> >> > Cause I
> >> > really cannot think of any other reason why you are doing this.
> >> >
> >> >
> >>
> >> I think I see your point. By capping pages reclaimed per cycle to 16,
> >> there is not much difference even if those 16 pages are spread in
> >> separate
> >> LRUs . The difference is only significant when we ever raise that cap.
> >> To
> >> preserve the current behavior of ewb loops independent on number of LRUs
> >> to loop through for each reclaiming cycle, regardless of the exact value
> >> of the page cap, I would still think current approach in the patch is
> >> reasonable choice. What do you think?
> >
> > To me I won't bother to do that. Having less than 16 pages in one LRU is
> > *extremely rare* that should never happen in practice. It's pointless
> > to make
> > such code adjustment at this stage.
> >
> > Let's focus on enabling functionality first. When you have some real
> > performance issue that is related to this, we can come back then.
> >
>
> I have done some rethinking about this and realize this does save quite
> some significant work: without breaking out isolation part from
> sgx_reclaim_pages(), I can remove the changes to use a list for isolated
> pages, and no need to introduce "state" such as RECLAIM_IN_PROGRESS. About
> 1/3 of changes for per-cgroup reclamation will be gone.
>
> So I think I'll go this route now. The only downside may be performance if
> a enclave spreads its pages in different cgroups and even that is minimum
> impact as we limit reclamation to 16 pages a time. Let me know if someone
> feel strongly we need dealt with that and see some other potential issues
> I may have missed.

We could deal with possible performance regression later on (if there
is need). I mean there should we a workload first that would so that
sort stimulus...

> Thanks
>
> Haitao

BR, Jarkko