Re: [PATCH v2 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)

From: Moger, Babu
Date: Thu Feb 08 2024 - 12:30:11 EST


Hi Reinette,

I am trying to propose few things here to move forward based on my
assumptions. Please point me if I missed something.

On 2/5/24 16:38, Reinette Chatre wrote:
> Hi Babu,
>
> On 2/2/2024 1:57 PM, Moger, Babu wrote:
>> On 2/1/2024 10:09 PM, Reinette Chatre wrote:
>>> On 1/19/2024 10:22 AM, Babu Moger wrote:
>>>> These series adds the support for Assignable Bandwidth Monitoring Counters
>>> Not a good start ([1]).
>>
>> Yea. My bad.
>>
>>>
>>>> (ABMC). It is also called QoS RMID Pinning feature
>>>>
>>>> The feature details are documented in the  APM listed below [1].
>>>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>>>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>>>> Monitoring (ABMC). The documentation is available at
>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>>>>
>>>> The patches are based on top of commit
>>>> 1ac6b49423e83af2abed9be7fbdf2e491686c66b (tip/master)
>>>>
>>>> # Introduction
>>>>
>>>> AMD hardware can support 256 or more RMIDs. However, bandwidth monitoring
>>>> feature only guarantees that RMIDs currently assigned to a processor will
>>>> be tracked by hardware. The counters of any other RMIDs which are no longer
>>>> being tracked will be reset to zero. The MBM event counters return
>>>> "Unavailable" for the RMIDs that are not active.
>>>>      Users can create 256 or more monitor groups. But there can be only limited
>>>> number of groups that can be give guaranteed monitoring numbers.  With ever
>>> "can be given"?
>>
>> "can give guaranteed monitoring numbers."
>>
>> I feel this looks better.
>
> Sounds good. Thank you.
>
>>
>>>
>>>> changing configurations there is no way to definitely know which of these
>>>> groups will be active for certain point of time. Users do not have the
>>>> option to monitor a group or set of groups for certain period of time
>>>> without worrying about RMID being reset in between.
>>>>      The ABMC feature provides an option to the user to assign an RMID to the
>>>> hardware counter and monitor the bandwidth for a longer duration.
>>>> The assigned RMID will be active until the user unassigns it manually.
>>>> There is no need to worry about counters being reset during this period.
>>>> Additionally, the user can specify a bitmask identifying the specific
>>>> bandwidth types from the given source to track with the counter.
>>>>
>>>> Without ABMC enabled, monitoring will work in current mode without
>>>> assignment option.
>>>>
>>>> # Linux Implementation
>>>>
>>>> Linux resctrl subsystem provides the interface to count maximum of two
>>>> memory bandwidth events per group, from a combination of available total
>>>> and local events. Keeping the current interface, users can assign a maximum
>>>> of 2 ABMC counters per group. User will also have the option to assign only
>>>> one counter to the group. If the system runs out of assignable ABMC
>>>> counters, kernel will display an error. Users need to unassign an already
>>>> assigned counter to make space for new assignments.
>>>>
>>>>
>>>> # Examples
>>>>
>>>> a. Check if ABMC support is available
>>>>     #mount -t resctrl resctrl /sys/fs/resctrl/
>>>>
>>>>     #cat /sys/fs/resctrl/info/L3_MON/mon_features
>>>>     llc_occupancy
>>>>     mbm_total_bytes
>>>>     mbm_total_bytes_config
>>>>     mbm_local_bytes
>>>>     mbm_local_bytes_config
>>>>     mbm_assign_capable ←  Linux kernel detected ABMC feature
>>>>
>>>> b. Check if ABMC is enabled. By default, ABMC feature is disabled.
>>>>     Monitoring works in legacy monitor mode when ABMC is not enabled.
>>>>
>>>>     #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_enable
>>>>     0
>>>>
>>> With the introduction of "mbm_assign_enable" the entry in mon_features seems
>>> to provide duplicate information.
>>
>> ok. We can remove the text in mon_features and keep mbm_assign_enable. We need this to enable and disable the feature.
>
> This could be improved beyond a binary "enable"/"disable" interface to user space.
> For example, the hardware can discover which "mbm counter assign" related feature
> (I'm counting the "soft RMID" here as one of the "mbm counter assign" related
> features) is supported on the platform and it can be presented to the user like:
>
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign
> [feature_1] feature_2 feature_3

How about this?
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign
ABMC:Capable
SOFT-RMID:Capable

To enable ABMC
# echo ABMC:enable > /sys/fs/resctrl/info/L3_MON/mbm_assign

When ABMC is enabled:
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign
ABMC:Enable
SOFT-RMID:Capable

> The output indicates which features are supported by the platform and the brackets indicate
> which feature is enabled.
>
>
>>>> c. There will be new file "monitor_state" for each monitor group when ABMC
>>>>     feature is supported. However, monitor_state is not available if ABMC is
>>>>     disabled.
>>>>     
>>>>     #cat /sys/fs/resctrl/monitor_state
>>>>     Unsupported
>>> This sounds potentially confusing since users will still be able to monitor
>>> the groups ...
>> How about "Assignment-Unsupported"?
>
> (please see later)
>
>>>
>>>>     
>>>> d. Read the event mbm_total_bytes and mbm_local_bytes. Without ABMC
>>>>     enabled, monitoring will work in current mode without assignment option.
>>>>     
>>>>     # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>>>     779247936
>>>>     # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>     765207488
>>>>     
>>>> e. Enable ABMC mode.
>>>>
>>>>     #echo 1 > /sys/fs/resctrl/info/L3_MON/mbm_assign_enable
>>>>          #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_enable
>>>>          1
>>>>
>>>> f. Read the monitor states. By default, both total and local MBM
>>>>     events are in "unassign" state.
>>>>     
>>>>     #cat /sys/fs/resctrl/monitor_state
>>>>     total=unassign;local=unassign
>>> This interface does not seem to take into account that hardware
>>> can support assignment per domain. I understand that this is
>>> not something you want to implement at this time but the user interface
>>> has to accommodate such an enhancement. This was already mentioned, and
>>> you did acknowledge the point [3] to this new version that does not
>>> reflect this is unexpected.
>>
>> Yea. Domain level assignment is not supported at this point. Do you want me to explicitly mention here?
>>
>> Please elaborate what you meant here.
>
> You have made it clear on several occasions that you do not intend to support
> domain level assignment. That may be ok but the interface you create should
> not prevent future support of domain level assignment.
>
> If my point is not clear, could you please share how this interface is able to
> support domain level assignment in the future?
>
> I am starting to think that we need a file similar to the schemata file
> for group and domain level monitor configurations.

Something like this?

By default
#cat /sys/fs/resctrl/monitor_state
default:0=total=assign,local=assign;1=total=assign,local=assign

With ABMC,
#cat /sys/fs/resctrl/monitor_state
ABMC:0=total=unassign,local=unassign;1=total=unassign,local=unassign

>
>>> My previous suggestions do seem to still stand and and I also am not able to
>>> see how Peter's requests [2] were considered. This same interface needs to
>>> accommodate usages apart from ABMC. For example, how to use this interface
>>> to address the same counter issue on AMD hardware without ABMC, and MPAM
>>> (pending James's feedback).
>>
>> Yea. Agree. Peter's comments are not addressed. I am not all clear
>> about details of Peters and James requirement.
>
> Peter listed his requirements in [1]. That email thread is a worthwhile read
> for the use cases.
>
> I believe that James is aware of this work and do hope to hear from him.
>
>>
>> With respect to ABMC here are my requirements.
>>
>> a.  Assignment needs to be done at group level.
>>
>> b. User should be able to assign each event individually. Multiple events assignment(in one command) should be supported.
>>
>> c. I have no plans to implement domain level assignment. It is done at system level.
>>
>> d. We need only couple of states.  Assigned and unassigned.
>>
>> e. monitor_state is name of file for user interface. We can change that based on comments.
>>
>> Peter, James,
>>
>> Please comment on what you want achieve in "assignment" based on the features you are working on.
>>
>> Do you want to add new states?
>>
>>>
>>> I understand that until we hear from Arm we do not know all the requirements
>>> that this interface needs to support, but I do expect this interface to
>>> at least consider requirements and usage scenarios that are already known.
>>
>> Sure. Will try that in the next version. Lets continue the discussion.
>>
>>
>>>> g. Read the event mbm_total_bytes and mbm_local_bytes. In ABMC mode,
>>>>     the MBA events are not available until the user assigns the events
>>>>     explicitly.
>>>>     
>>>>     #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>>>     Unsupported
>>>>     
>>>>     #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>     Unsupported
>>>>
>>> This needs some more thought to accommodate Peter's scenario where the counter
>>> can be expected to return the final count after the counter is disabled.
>>
>> I am not sure how to achieve this with ABMC. This may be applicable
>> to soft rmid only. In case of "soft rmid", previous readings are
>> saved in the soft rmid state.
>
> Right. Please consider this work in two parts, first, there is a generic
> interface that aims to support ABMC, "soft RMID", and MPAM. Second, there
> is using this interface to support ABMC.

Yea. But it is tough without knowing all the details of the other features.

How about?

#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
ABMC:Unassigned
#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
ABMC:Unassigned

>
>>>> h. The event llc_occupancy is not affected by ABMC mode. Users can still
>>>>     read the llc_occupancy.
>>>>
>>>>     #cat /sys/fs/resctrl/mon_data/mon_L3_00/llc_occupancy
>>>>     557056
>>>>
>>>> i. Now assign the total event and read the monitor_state.
>>>>     
>>>>     #echo total=assign > /sys/fs/resctrl/monitor_state
>>>>     #cat /sys/fs/resctrl/monitor_state
>>>>     total=assign;local=unassign
>>>>     
>>> I do not see the "global assign/unassign" scenario addressed.
>>
>> I am not all clear on meaning of "global assign/unassign". Does it
>> mean looping thru all the groups and assign the RMIDs?
>
> Please see [1].
>
>
>> It may not work in many cases. In case of ABMC, we have only limited
>> number of hw counters. It will fail after hardware runs out of
>> counters. It is better done selectively based on which group user is
>> interested in.
>
> Right. This is one more item where the generic interface needs to
> accommodate different hardware implementations. Perhaps this could
> be one of the "features" exposed by (global) mbm_assign that the
> user can "enable"/"disable" on demand?
>
>> But it can be done later if we find a use case for that.
>
> There already exists a use case as presented by Peter in support
> of AMD hardware without ABMC, no?

Yes. There is a use case. But seems like the use case is mostly applicable
to soft-rmid feature.

We can tie the global assign only to soft-rmid.

# echo SOFT-RMID:enable > /sys/fs/resctrl/info/L3_MON/mbm_assign

Because this is soft-rmid, call global assign method.

# echo ABMC:enable > /sys/fs/resctrl/info/L3_MON/mbm_assign

Because this is ABMC, do the steps required just to enable ABMC.
Don't do individual assignment

>
>>> This version seems to ignore (without discussion) a lot of earlier
>>> feedback.
>>
>> Please feel free comment. There are various threads of discussion. I may have missed.
>>
>
>
> Reinette
>
> [1] https://lore.kernel.org/lkml/CALPaoCiRD6j_Rp7ffew+PtGTF4rWDORwbuRQqH2i-cY5SvWQBg@xxxxxxxxxxxxxx/

--
Thanks
Babu Moger