Re: [PATCH v6 6/8] x86/resctrl: Introduce snc_nodes_per_l3_cache

From: Tony Luck
Date: Fri Sep 29 2023 - 16:48:27 EST


On Fri, Sep 29, 2023 at 04:21:54PM +0200, Peter Newman wrote:
> Hi Tony,
>
> On Thu, Sep 28, 2023 at 9:14 PM Tony Luck <tony.luck@xxxxxxxxx> wrote:
> >
> > Intel Sub-NUMA Cluster (SNC) is a feature that subdivides the CPU cores
> > and memory controllers on a socket into two or more groups. These are
> > presented to the operating system as NUMA nodes.
> >
> > This may enable some workloads to have slightly lower latency to memory
> > as the memory controller(s) in an SNC node are electrically closer to the
> > CPU cores on that SNC node. This cost may be offset by lower bandwidth
> > since the memory accesses for each core can only be interleaved between
> > the memory controllers on the same SNC node.
> >
> > Resctrl monitoring on Intel system depends upon attaching RMIDs to tasks
> > to track L3 cache occupancy and memory bandwidth. There is an MSR that
> > controls how the RMIDs are shared between SNC nodes.
> >
> > The default mode divides them numerically. E.g. when there are two SNC
> > nodes on a socket the lower number half of the RMIDs are given to the
> > first node, the remainder to the second node. This would be difficult
> > to use with the Linux resctrl interface as specific RMID values assigned
> > to resctrl groups are not visible to users.
> >
> > The other mode divides the RMIDs and renumbers the ones on the second
> > SNC node to start from zero.
> >
> > Even with this redumbering SNC mode requires several changes in resctrl
> > behavior for correct operation.
>
> redumbering? Harsh.

Spell check didn't catch this. Implies that redumbering is a real word!
Now I need to find an opportunity to use it :-)

Changed to renumbering.
>
>
> > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > index b0901fb95aa9..a5404c412f53 100644
> > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > @@ -1357,7 +1357,7 @@ unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r,
> > }
> > }
> >
> > - return size;
> > + return size / snc_nodes_per_l3_cache;
>
> To confirm, the size represented by a bit goes down rather than the
> CBM mask shrinking in each sub-NUMA domain?

Correct. The CBM mask keeps the same number of bits. Those bits are
all available to control allocation in each SNC node. But the amount
of cache you get per-bit is reduced in the ratio of the number of SNC
ways.

>
> I would maybe have expected the CBM mask to already be allocating at
> the smallest granularity the hardware supports.
>
> > }
> >
> > /**
> > @@ -2590,7 +2590,7 @@ static int rdt_parse_param(struct fs_context *fc, struct fs_parameter *param)
> > ctx->enable_cdpl2 = true;
> > return 0;
> > case Opt_mba_mbps:
> > - if (!supports_mba_mbps())
> > + if (!supports_mba_mbps() || snc_nodes_per_l3_cache > 1)
>
> Factor into supports_mba_mbps()?

Agreed. That's a better place. Moved this test into supports_mba_mbps().
>
> > return -EINVAL;
> > ctx->enable_mba_mbps = true;
> > return 0;
> > --
> > 2.41.0
> >
>
> Reviewed-by: Peter Newman <peternewman@xxxxxxxxxx>

Thanks

-Tony