Re: resctrl2 - status

From: Jonathan Cameron
Date: Mon Sep 18 2023 - 06:45:16 EST


On Fri, 15 Sep 2023 10:55:58 -0700
Drew Fustini <dfustini@xxxxxxxxxxxx> wrote:

> On Fri, Sep 08, 2023 at 04:13:54PM -0700, Tony Luck wrote:
> > On Fri, Sep 08, 2023 at 04:35:05PM -0500, Moger, Babu wrote:
> > > Hi Tony,
> > >
> > >
> > > On 9/8/2023 1:51 PM, Luck, Tony wrote:
> > > > > > Can you try this out on an AMD system. I think I covered most of the
> > > > > > existing AMD resctrl features, but I have no machine to test the code
> > > > > > on, so very likely there are bugs in these code paths.
> > > > > >
> > > > > > I'd like to make any needed changes now, before I start breaking this
> > > > > > into reviewable bite-sized patches to avoid too much churn.
> > > > > I tried your latest code briefly on my system. Unfortunately, I could
> > > > > not get it to work on my AMD system.
> > > > >
> > > > > # git branch -a
> > > > > next
> > > > > * resctrl2_v65
> > > > > # ]# uname -r
> > > > > 6.5.0+
> > > > > #lsmod |grep rdt
> > > > > rdt_show_ids 12288 0
> > > > > rdt_mbm_local_bytes 12288 0
> > > > > rdt_mbm_total_bytes 12288 0
> > > > > rdt_llc_occupancy 12288 0
> > > > > rdt_l3_cat 16384 0
> > > > >
> > > > > # lsmod |grep mbe
> > > > > amd_mbec 16384 0
> > > > >
> > > > > I could not get rdt_l3_mba
> > > > >
> > > > > # modprobe rdt_l3_mba
> > > > > modprobe: ERROR: could not insert 'rdt_l3_mba': No such device
> > > > >
> > > > > I don't see any data for the default group either.
> > > > >
> > > > > mount -t resctrl resctrl /sys/fs/resctrl/
> > > > >
> > > > > cd /sys/fs/resctrl/mon_data/mon_L3_00
> > > > >
> > > > > cat mbm_summary
> > > > > n/a n/a /
> > > > Babu,
> > > >
> > > > Thank a bunch for taking this for a quick spin. There's several bits of
> > > > good news there. Several modules automatically loaded as expected.
> > > > Nothing went "OOPS" and crashed the system.
> > > >
> > > > Here’s the code that the rdt_l3_mba module runs that can cause failure
> > > > to load with "No such device"
> > > >
> > > > if (!boot_cpu_has(X86_FEATURE_RDT_A)) {
> > > > pr_debug("No RDT allocation support\n");
> > > > return -ENODEV;
> > > > }
> > >
> > > Shouldn't this be ?(or similar)
> > >
> > > if (!rdt_cpu_has(X86_FEATURE_MBA))
> > >                 return false;
> >
> > Yes. I should be using X86_FEATURE bits where they are available
> > rather than peeking directly at CPUID register bits.
> >
> > >
> > > > mba_features = cpuid_ebx(0x10);
> > > >
> > > > if (!(mba_features & BIT(3))) {
> > > > pr_debug("No RDT MBA allocation\n");
> > > > return -ENODEV;
> > > > }
> > > >
> > > > I assume the first test must have succeeded (same code in rdt_l3_cat, and
> > > > that loaded OK). So must be the second. How does AMD enumerate MBA
> > > > support?
> > > >
> > > > Less obvious what is the root cause of the mbm_summary file to fail to
> > > > show any data. rdt_mbm_local_bytes and rdt_mbm_total_bytes modules
> > > > loaded OK. So I'm looking for the right CPUID bits to detect memory bandwidth
> > > > monitoring.
> > >
> > > I am still not sure if resctrl2 will address all the current gaps in
> > > resctrl1. We should probably list all issues on the table before we go that
> > > route.
> >
> > Indeed yes! I don't want to have to do resctrl3 in a few years to
> > cover gaps that could have been addressed in resctrl2.
> >
> > However, fixing resctrl gaps is only one of the motivations for
> > the rewrite. The bigger one is making life easier for all the
> > architectures sharing the common code to do what they need to
> > for their own quirks & differences without cluttering the
> > common code base, or worrying "did my change just break something
> > for another CPU architecture".
> >
> > > One of the main issue for AMD is coupling of LLC domains.
> > >
> > > For example, AMD hardware supports 16 CLOSids per LLC domain. But Linux
> > > design assumes that there are globally 16 total CLOSIDs for the whole
> > > systems. We can only create 16 CLOSID now irrespective of how many domains
> > > are there.
> > >
> > > In reality, we should be able to create "16 x number of LLC domains" CLOSIDS
> > > in the systems.  This is more evident in AMD. But, same problem applies to
> > > Intel with multiple sockets.
> >
> > I think this can be somewhat achieved already with a combination of
> > resctrl and cpusets (or some other way to set CPU affinity for tasks
> > to only run on CPUs within a specific domain (or set of domains).
> > That's why the schemata file allows setting different CBM masks
> > per domain.
> >
> > Can you explain how you would use 64 domains on a system with 4 domains
> > and 16 CLOSID per domain?
> >
> > > My 02 cents. Hope to discuss more in our upcoming meeting.
> > Agreed. This will be faster when we can talk instead of type :-)
>
> Is it a meeting that other interested developers can join?
>
> This reminds me that Linux Plumbers Conference [1] is in November and
> I think resctrl2 could be a good topic. The CFP is still open for Birds
> of a Feather (BoF) proposals [2]. These are free-form get-togethers for
> people wishing to discuss a particular topic, and I have had success
> hosting them in the past for topics like pinctrl and gpio.
>
> Anyone planning to attend Plumbers?
>
> I'll be going in person but the virtual option works really well in my
> experience. I had developers and maintainers attending virtually
> participate in my BoF sessions and I felt it was very productive.

FWIW I'm keen and should be there in person. However, I'm not on the must
be available list for this one ;) Agree that hybrid worked fine for BoF last
year.

Jonathan


>
> thanks,
> drew
>
> [1] https://lpc.events/
> [2] https://lpc.events/event/17/abstracts/
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel