Re: [PATCH v3 3/8] x86/resctrl: Add a new node-scoped resource to rdt_resources_all[]

From: Reinette Chatre
Date: Tue Jul 18 2023 - 19:45:59 EST

Next message: Yin Fengwei: "Re: [RFC PATCH v2 3/3] mm: mlock: update mlock_pte_range to handle large folio"
Previous message: Guo Ren: "Re: [PATCH v3] riscv: entry: set a0 = -ENOSYS only when syscall != -1"
In reply to: Tony Luck: "Re: [PATCH v3 3/8] x86/resctrl: Add a new node-scoped resource to rdt_resources_all[]"
Next in thread: Luck, Tony: "RE: [PATCH v3 3/8] x86/resctrl: Add a new node-scoped resource to rdt_resources_all[]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Tony,

On 7/18/2023 3:57 PM, Tony Luck wrote:
> On Tue, Jul 18, 2023 at 01:40:32PM -0700, Reinette Chatre wrote:
>>> + [RDT_RESOURCE_NODE] =
>>> + {
>>> + .r_resctrl = {
>>> + .rid = RDT_RESOURCE_NODE,
>>> + .name = "L3",
>>> + .scope = SCOPE_NODE,
>>> + .domains = domain_init(RDT_RESOURCE_NODE),
>>> + .fflags = 0,
>>> + },
>>> + },
>>> };
>>
>> So the new resource has the same name, from user perspective,
>> as RDT_RESOURCE_L3. From this perspective it thus seems to be a
>> shadow of RDT_RESOURCE_L3 that is used as alternative for some properties
>> of the actual RDT_RESOURCE_L3? This is starting to look as though this
>> solution is wrenching itself into current architecture.
>>
>> >From what I can tell the monitoring in SNC environment needs a different
>> domain list because of the change in scope. What else is needed in the
>> resource that is different from the existing L3 resource? Could the
>> monitoring scope of a resource not instead be made distinct from its
>> allocation scope? By default monitoring and allocation scope will be
>> the same and thus use the same domain list but when SNC is enabled
>> then monitoring uses a different domain list.
>
> Answering this part first, because my choice here affects a bunch
> of the code that also raised comments from you.

Indeed.

>
> The crux of the issue is that when SNC mode is enabled the scope
> for L3 monitoring functions changes to "node" scope, while the
> scope of L3 control functions (CAT, CDP) remains at L3 cache scope.
>
> My solution was to just create a new resource. But you have an
> interesing alternate solution. Add an extra domain list to the
> resource structure to allow creation of distinct domain lists
> for this case where the scope for control and monitor functions
> differs.
>
> So change the resource structure like this:
>
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 8334eeacfec5..01590aa59a67 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -168,10 +168,12 @@ struct rdt_resource {
> bool alloc_capable;
> bool mon_capable;
> int num_rmid;
> - int cache_level;
> + int ctrl_scope;
> + int mon_scope;

I am not sure about getting rid of cache_level so fast.
I see regarding the current problem being solved that
ctrl_scope would have the same values as cache_level but
I find that adding this level of indirection while keeping
the comparison with cacheinfo->level to create a trap
for future mistakes.

> struct resctrl_cache cache;
> struct resctrl_membw membw;
> - struct list_head domains;
> + struct list_head ctrl_domains;
> + struct list_head mon_domains;
> char *name;
> int data_width;
> u32 default_ctrl;
>
> and build/use separate domain lists for when this resource is
> being referenced for allocation/monitoring. E.g. domain_add_cpu()
> would check "r->alloc_capable" and add a cpu to the ctrl_domains
> list based on the ctrl_scope value. It would do the same with
> mon_capable / mon_domains / mon_scope.
>
> If ctrl_scope == mon_scope, just build one list as you suggest above.

Yes, this is the idea. Thank you for considering it. Something else
to consider that may make this even cleaner/simpler would be to review
struct rdt_domain and struct rdt_hw_domain members for "monitor" vs "control"
usage. These structs could potentially be split further into separate
"control" and "monitor" variants. For example, "struct rdt_domain" split into
"struct rdt_ctrl_domain" and "struct rdt_mon_domain". If there is a clean
split then resctrl can always create two lists with the unnecessary duplication
eliminated when two domain lists are created. This would also
eliminate the need to scatter ctrl_scope == mon_scope checks throughout.

>
> Maybe there are more places that walk the list of control domains than
> walk the list of monitor domains. Need to audit this set:
>
> $ git grep list_for_each.*domains -- arch/x86/kernel/cpu/resctrl
> arch/x86/kernel/cpu/resctrl/core.c: list_for_each_entry(d, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/core.c: list_for_each(l, &r->domains) {
> arch/x86/kernel/cpu/resctrl/ctrlmondata.c: list_for_each_entry(d, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/ctrlmondata.c: list_for_each_entry(d, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/ctrlmondata.c: list_for_each_entry(dom, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/monitor.c: list_for_each_entry(d, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/pseudo_lock.c: list_for_each_entry(d_i, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(dom, &r->domains, list)
> arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(dom, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(dom, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &r_l->domains, list) {
> arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(dom, &r->domains, list)
> arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(dom, &r->domains, list) {
> arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &s->res->domains, list) {
> arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &r->domains, list) {
>
> Maybe "domains" can keep its name and make a "list_for_each_monitor_domain()" macro
> to pick the right list to walk?

It is not clear to me how "domains" can keep its name. If I understand
the macro would be useful if scope always needs to be considered. I wonder
if the list walkers may not mostly just walk the appropriate list directly
if resctrl always creates separate "control domain" and "monitor domain"
lists.

> I don't think this will reduce the amount of code change in a
> significant way. But it may be conceptually easier to follow
> what is going on.

Reducing the amount of code changed is not a goal to me. If I understand
correctly I think that adapting resctrl to support different monitor and
control scope could create a foundation into which SNC can slot in smoothly.

Reinette

Next message: Yin Fengwei: "Re: [RFC PATCH v2 3/3] mm: mlock: update mlock_pte_range to handle large folio"
Previous message: Guo Ren: "Re: [PATCH v3] riscv: entry: set a0 = -ENOSYS only when syscall != -1"
In reply to: Tony Luck: "Re: [PATCH v3 3/8] x86/resctrl: Add a new node-scoped resource to rdt_resources_all[]"
Next in thread: Luck, Tony: "RE: [PATCH v3 3/8] x86/resctrl: Add a new node-scoped resource to rdt_resources_all[]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]