Re: [PATCH v2 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)

From: Moger, Babu
Date: Mon Feb 19 2024 - 13:00:59 EST


Hi Peter,

On 2/16/24 14:18, Peter Newman wrote:
> Hi Babu,
>
> On Thu, Feb 8, 2024 at 9:29 AM Moger, Babu <babu.moger@xxxxxxx> wrote:
>> On 2/5/24 16:38, Reinette Chatre wrote:
>>> This could be improved beyond a binary "enable"/"disable" interface to user space.
>>> For example, the hardware can discover which "mbm counter assign" related feature
>>> (I'm counting the "soft RMID" here as one of the "mbm counter assign" related
>>> features) is supported on the platform and it can be presented to the user like:
>>>
>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign
>>> [feature_1] feature_2 feature_3
>>
>> How about this?
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign
>> ABMC:Capable
>> SOFT-RMID:Capable
>>
>> To enable ABMC
>> # echo ABMC:enable > /sys/fs/resctrl/info/L3_MON/mbm_assign
>>
>> When ABMC is enabled:
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign
>> ABMC:Enable
>> SOFT-RMID:Capable
>
> There would be no need to use soft RMIDs on a system that supports
> ABMC, so I can't think of a reason why the underlying implementation
> would matter to our users. The user should only have to request the
> interface where monitors must be assigned manually. The mount would
> succeed if the system has a way to support the interface.

Ok Sure. I will exclude Soft-rmid for this interface.

For now, lets keep this only for ABMC.
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign
ABMC:Capable

Or

# cat /sys/fs/resctrl/info/L3_MON/mbm_assign
ABMC:Enable

>
>
>>> You have made it clear on several occasions that you do not intend to support
>>> domain level assignment. That may be ok but the interface you create should
>>> not prevent future support of domain level assignment.
>>>
>>> If my point is not clear, could you please share how this interface is able to
>>> support domain level assignment in the future?
>>>
>>> I am starting to think that we need a file similar to the schemata file
>>> for group and domain level monitor configurations.
>>
>> Something like this?
>>
>> By default
>> #cat /sys/fs/resctrl/monitor_state
>> default:0=total=assign,local=assign;1=total=assign,local=assign
>>
>> With ABMC,
>> #cat /sys/fs/resctrl/monitor_state
>> ABMC:0=total=unassign,local=unassign;1=total=unassign,local=unassign
>
> The benefit from all the string parsing in this interface is only
> halving the number of monitor_state sysfs writes we'd need compared to
> creating a separate file for mbm_local and mbm_total. Given that our
> use case is to assign the 32 assignable counters to read the bandwidth
> of ~256 monitoring groups, this isn't a substantial gain to help us. I
> think you should just focus on providing the necessary control
> granularity without trying to consolidate writes in this interface. I

Ok. Looks like we need to provide the interface to assign the RMIDs to
individual domains in this interface. I wasn't planning that now. But, it
can be done without much changes.

Something like this(corrected typos: replaced '=' with '-').

#cat /sys/fs/resctrl/monitor_state
ABMC:0=total-unassign,local-unassign;1=total-unassign,local-unassign

To assign:

#echo "ABMC:0=total-assign,local-assign" > /sys/fs/resctrl/monitor_state


> will propose an additional interface below to optimize our use case.
>
> Whether mbm_total and mbm_local are combined in the group directories
> or not, I don't see why you wouldn't just repeat the same file
> interface in the domain directories for a user needing finer-grained
> controls.

I don't see the need for the same file inside each domain directory in the
group level when the above command can assign the RMIDs per domain.

>
>
>>>> Peter, James,
>>>>
>>>> Please comment on what you want achieve in "assignment" based on the features you are working on.
>
> I prototyped and tested the following additional interface for the
> large-scale, batch use case that we're primarily concerned about:
>
> info/L3_MON/mbm_{local,total}_bytes_assigned
>
> Writing a whitespace-delimited list of mongroup directory paths does
> the following:
> 1. unassign all monitors for the given counter
> 2. assigns a monitor to each mongroup referenced in the write
> 3. batches per-domain register updates resulting from the assignments
> into a single IPI for each domain
>
> This interface allows us to do less sysfs writes and IPIs on systems
> with more assignable monitoring resources, rather than doing more.
>
> The reference to a mongroup when reading/writing the above node is the
> resctrl-root-relative path to the monitoring group. There is probably
> a more concise way to refer to the groups, but my prototype used
> kernfs_walk_and_get() to locate each rdtgroup struct.
>
> I would also like to add that in the software-ABMC prototype I made,
> because it's based on assignment of a small number of RMIDs,
> assignment results in all counters being assigned at once. On
> implementations where per-counter assignments aren't possible,
> assignment through such a resource would be allowed to assign more
> resources than explicitly requested.
>
> This would allow an implementation only capable of global assignment
> to assign resources to all groups when a non-empty string is written
> to the proposed file nodes, and all resources to be unassigned when an
> empty string is written. Reading back from the file nodes would tell
> the user how much was actually assigned.

Yes. This interface can be extended to ABMC as a global assignment option.
If you have your patches ready I can add your patches on top of my ABMC
feature.
Or if you want to add the support later then I will go ahead with current
base ABMC support.
Let me know.
--
Thanks
Babu Moger