Re: [PATCH v9 08/15] x86/sgx: Implement EPC reclamation flows for cgroup

From: Haitao Huang
Date: Wed Mar 27 2024 - 20:26:59 EST


On Thu, 22 Feb 2024 16:24:47 -0600, Huang, Kai <kai.huang@xxxxxxxxx> wrote:



On 23/02/2024 9:12 am, Haitao Huang wrote:
On Wed, 21 Feb 2024 04:48:58 -0600, Huang, Kai <kai.huang@xxxxxxxxx> wrote:

On Wed, 2024-02-21 at 00:23 -0600, Haitao Huang wrote:
StartHi Kai
On Tue, 20 Feb 2024 03:52:39 -0600, Huang, Kai <kai.huang@xxxxxxxxx> wrote:
[...]
>
> So you introduced the work/workqueue here but there's no place which
> actually
> queues the work. IMHO you can either:
>
> 1) move relevant code change here; or
> 2) focus on introducing core functions to reclaim certain pages from a
> given EPC
> cgroup w/o workqueue and introduce the work/workqueue in later patch.
>
> Makes sense?
>

Starting in v7, I was trying to split the big patch, #10 in v6 as you and
others suggested. My thought process was to put infrastructure needed for
per-cgroup reclaim in the front, then turn on per-cgroup reclaim in [v9
13/15] in the end.

That's reasonable for sure.

Thanks for the confirmation :-)


Before that, all reclaimables are tracked in the global LRU so really
there is no "reclaim certain pages from a given EPC cgroup w/o workqueue"
or reclaim through workqueue before that point, as suggested in #2. This
patch puts down the implementation for both flows but neither used yet, as
stated in the commit message.

I know it's not used yet. The point is how to split patches to make them more
self-contain and easy to review.
I would think this patch already self-contained in that all are implementation of cgroup reclamation building blocks utilized later. But I'll try to follow your suggestions below to split further (would prefer not to merge in general unless there is strong reasons).


For #2, sorry for not being explicit -- I meant it seems it's more reasonable to
split in this way:

Patch 1)
a). change to sgx_reclaim_pages();
I'll still prefer this to be a separate patch. It is self-contained IMHO.
We were splitting the original patch because it was too big. I don't want to merge back unless there is a strong reason.

b). introduce sgx_epc_cgroup_reclaim_pages();
Ok.

If I got you right, I believe you want to have a cgroup variant function following the same behaviour of the one for global reclaim, i.e., the _current_ sgx_reclaim_pages(), which always tries to scan and reclaim SGX_NR_TO_SCAN pages each time.

And this cgroup variant function, sgx_epc_cgroup_reclaim_pages(), tries to scan and reclaim SGX_NR_TO_SCAN pages each time "_across_ the cgroup and all the descendants".

And you want to implement sgx_epc_cgroup_reclaim_pages() in this way due to WHATEVER reasons.

In that case, the change to sgx_reclaim_pages() and the introduce of sgx_epc_cgroup_reclaim_pages() should really be together because they are completely tied together in terms of implementation.

In this way you can just explain clearly in _ONE_ patch why you choose this implementation, and for reviewer it's also easier to review because we can just discuss in one patch.

Makes sense?


c). introduce sgx_epc_cgroup_reclaim_work_func() (use a better name), which just takes an EPC cgroup as input w/o involving any work/workqueue.
This is for the workqueue use only. So I think it'd be better be with patch #2 below?

There are multiple levels of logic here IMHO:

1. a) and b) above focus on "each reclaim" a given EPC cgroup
2. c) is about a loop of above to bring given cgroup's usage to limit
3. workqueue is one (probably best) way to do c) in async way
4. the logic where 1) (direct reclaim) and 3) (indirect) are triggered

To me, it's clear 1) should be in one patch as stated above.

Also, to me 3) and 4) are better to be together since they give you a clear view on how the direct/indirect reclaim are triggered.

2) could be flexible depending on how you see it. If you prefer viewing it from low-level implementation of reclaiming pages from cgroup, then it's also OK to be together with 1). If you want to treat it as a part of _async_ way of bring down usage to limit, then _MAYBE_ it's also OK to be with 3) and 4).

But to me 2) can be together with 1) or even a separate patch because it's still kinda of low-level reclaiming details. 3) and 4) shouldn't contain such detail but should focus on how direct/indirect reclaim is done.

I incorporated most of your suggestions, and think it'd be better discuss this with actual code.

So I'm sending out v10, and just quickly summarize what I did to address this particular issue here.

I pretty much follow above suggestions and end up with two patches:

1) a) and b) above plus direct reclaim triggered in try_charge() so reviewers can see at lease one use of the sgx_cgroup_reclaim_pages(), which is the basic building block.

2) All async related: c) above, workqueue, indirect triggered in try_charge() which queues the work.

Please review v10 and if you think the triggering parts need be separated then I'll separate.

Additionally, after more experimentation, I simplified sgx_reclaim_pages() by removing the pointer for *nr_to_scan as you suggested, but returning pages collected for isolation (attempted for reclaim) instead of pages actually reclaimed. I found performance is acceptable with this approach.

Thanks again for your review.
Haitao