Re: [PATCH v2 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)

From: Reinette Chatre
Date: Thu Mar 07 2024 - 17:53:45 EST


Hi Peter,

On 3/7/2024 2:33 PM, Peter Newman wrote:
> Hi Reinette,
>
> On Thu, Mar 7, 2024 at 12:41 PM Reinette Chatre
> <reinette.chatre@xxxxxxxxx> wrote:
>>
>> Hi Peter,
>>
>> On 3/7/2024 10:57 AM, Peter Newman wrote:
>>> Hi Babu,
>>>
>>> On Mon, Mar 4, 2024 at 2:24 PM Moger, Babu <bmoger@xxxxxxx> wrote:
>>>> Based on our discussion, I am listing few examples here. Let me know if
>>>> I missed something.
>>>>
>>>> mount -t resctrl resctrl /sys/fs/resctrl/
>>>>
>>>> 1. Assign both local and total counters to default group on domain 0 and 1.
>>>> $echo "//00=lt;01=lt" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>
>>>> $cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>> //00=lt;01=lt
>>>>
>>>> 2. Assign a total event to mon group inside the default group for both
>>>> domain 0 and 1.
>>>>
>>>> $mkdir /sys/fs/resctrl/mon_groups/mon_a
>>>> $echo "/mon_a/00+t;01+t" >
>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>
>>>> $cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>> //00=lt;01=lt
>>>> /mon_a/00=t;01=t
>>>>
>>>> 3. Assign a local event to non-default control mon group both domain 0
>>>> and 1.
>>>> $mkdir /sys/fs/resctrl/ctrl_a
>>>> $echo "/ctrl_a/00=l;01=l" >
>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>
>>>> $cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>> //00=lt;01=lt
>>>> /mon_a/00=t;01=t
>>>> /ctrl_a/00=l;01=l
>>>>
>>>> 4. Assign a both counters to mon group inside another control
>>>> group(non-default).
>>>> $mkdir /sys/fs/resctrl/ctrl_a/mon_ab/
>>>> $echo "ctrl_a/mon_ab/00=lt;01=lt" >
>>>> /sys/fs/resctrl/nfo/L3_MON/mbm_assign_contro
>>>>
>>>> $cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>> //00=lt;01=lt
>>>> /mon_a/00=t;01=t
>>>> /ctrl_a/00=l;01=l
>>>> ctrl_a/mon_ab/00=lt;01=lt
>>>>
>>>> 5. Unassign a counter to mon group inside another control
>>>> group(non-default).
>>>> $echo "ctrl_a/mon_ab/00-l;01-l" >
>>>> /sys/fs/resctrl/nfo/L3_MON/mbm_assign_control
>>>>
>>>> $cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>> //00=lt;01=lt
>>>> /mon_a/00=t;01=t
>>>> /ctrl_a/00=l;01=l
>>>> ctrl_a/mon_ab/00=t;01=t
>>>>
>>>> 6. Unassign all the counters on a specific group.
>>>> $echo "ctrl_a/mon_ab/00=_" >
>>>> /sys/fs/resctrl/nfo/L3_MON/mbm_assign_control
>>>>
>>>> $cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>> //00=lt;01=lt
>>>> /mon_a/00=t;01=t
>>>> /ctrl_a/00=l;01=l
>>>> ctrl_a/mon_ab/00=_;01=_
>>>
>>> The use case I'm interested in is iterating 32 counters over 256
>>> groups[1]. If it's not possible to reassign 32 counters in a single
>>> write system call, with just one IPI per domain per batch reassignment
>>> operation, then I don't see any advantage over the original proposal
>>> with the assignment control file in every group directory. We already
>>> had fine-grained control placing assign/unassign nodes throughout the
>>> directory hierarchy, with the scope implicit in the directory
>>> location.
>>
>> The intent of this interface is to support modification of several
>> groups with a single write. These examples only show impact to a single
>> group at a time, but multiple groups can be modified by separating
>> configurations with a "\n". I believe Babu was planning to add some
>> of these examples in his next iteration since it is not obvious yet.
>>
>>>
>>> The interface I proposed in [1] aims to reduce the per-domain IPIs by
>>> a factor of the number of counters, rather than sending off 2 rounds
>>> of IPIs to each domain for each monitoring group.
>>
>> I understood the proposed interface appeared to focus on one use case
>> while the goal is to find an interface to support all requirements.
>> With this proposed interface it it possible to make large scale changes
>> with a single sysfs write.
>
> Ok I see you requested[1] one such example earlier.
>
> From what I've read, is this what you had in mind of reassigning 32
> counters from the first 16 groups to the next?
>
> I had found that it's hard to get a single write() syscall out of a
> string containing newlines, so I'm using one explicit call:

Apologies but this is not clear to me, could you please elaborate?

If you are referring to testing via shell you can try ANSI-C Quoting like:
echo -n $'c1/m1/00=_\nc2/m2/00=_\n'

>
> write([mbm_assign_control fd],
> "/c1/m1/00=_;02=_;03=_;04=_;05=_;06=_;07=_;08=_;09=_;10=_;11=_;12=_;13=_;14=_;15=_\n"
> "/c1/m2/00=_;01=_;02=_;03=_;04=_;05=_;06=_;07=_;08=_;09=_;10=_;11=_;12=_;13=_;14=_;15=_\n"
> "/c1/m3/00=_;01=_;02=_;03=_;04=_;05=_;06=_;07=_;08=_;09=_;10=_;11=_;12=_;13=_;14=_;15=_\n"
> [...]
> "/c1/m14/00=_;01=_;02=_;03=_;04=_;05=_;06=_;07=_;08=_;09=_;10=_;11=_;12=_;13=_;14=_;15=_\n"
> "/c1/m15/00=_;01=_;02=_;03=_;04=_;05=_;06=_;07=_;08=_;09=_;10=_;11=_;12=_;13=_;14=_;15=_\n"
> "/c1/m16/00=lt;01=lt;02=lt;03=lt;04=lt;05=lt;06=lt;07=lt;08=lt;09=lt;10=lt;11=lt;12=lt;13=lt;14=lt;15=lt\n"
> "/c1/m17/00=lt;01=lt;02=lt;03=lt;04=lt;05=lt;06=lt;07=lt;08=lt;09=lt;10=lt;11=lt;12=lt;13=lt;14=lt;15=lt\n"
> "/c1/m18/00=lt;01=lt;02=lt;03=lt;04=lt;05=lt;06=lt;07=lt;08=lt;09=lt;10=lt;11=lt;12=lt;13=lt;14=lt;15=lt\n"
> [...]
> "/c1/m30/00=lt;01=lt;02=lt;03=lt;04=lt;05=lt;06=lt;07=lt;08=lt;09=lt;10=lt;11=lt;12=lt;13=lt;14=lt;15=lt\n"
> "/c1/m31/00=lt;01=lt;02=lt;03=lt;04=lt;05=lt;06=lt;07=lt;08=lt;09=lt;10=lt;11=lt;12=lt;13=lt;14=lt;15=lt\n",
> size);

(so far no "/" needed as prefix)

We could also consider some syntax to mean "all domains". For example,
if no domain given then it can mean "all domains"?
So, your example could possibly also be accomplished with a

c1/m1/=_\nc1/m2/=_\nc1/m3/=_\n [...] c1/m16/=lt\nc1/m17/=lt\nc1/m18/=_\n [...]

Any thoughts?

Reinette