Re: [RFC PATCH 3/7] x86/resctrl: Add driver callback when directories are removed

From: Reinette Chatre
Date: Fri May 05 2023 - 19:19:40 EST


Hi Tony,

On 4/20/2023 3:06 PM, Tony Luck wrote:
> When a resctrl directory is removed, entities attached to that
> directory are reassigned with the CLOSID and RMID of the parent
> directory.

Could you please elaborate what you mean with "entities"? In
resctrl there are tasks needing to move but that is done in resctrl.
Would drivers be doing any manipulation of the tasks' closid/rmid?

>
> Add a callback function so a driver can reset the CLOSID and RMID
> of any resources attached to the removed directory.

I expect this behavior to be different between the removal of
a monitoring group vs a control group (more later) ...


> @@ -3495,11 +3495,18 @@ static int rdtgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
> static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
> {
> struct rdtgroup *prdtgrp = rdtgrp->mon.parent;
> + struct resctrl_driver *d;
> int cpu;
>
> /* Give any tasks back to the parent group */
> rdt_move_group_tasks(rdtgrp, prdtgrp, tmpmask);
>
> + list_for_each_entry(d, &drivers, list) {
> + if (d->rmdir)
> + d->rmdir(rdtgrp->closid, rdtgrp->mon.rmid,
> + prdtgrp->closid, prdtgrp->mon.rmid);
> + }
> +

This seems tricky ... if I understand correctly the API provides
limited properties directly to driver to keep things safe but at the
same time the properties need to accommodate all driver usages. All drivers
would need an update if something new is needed.

For example, this seems to ignore whether a resource group is exclusive
or not - could group removal trigger behavior that can circumvent this
restriction?

> /* Update per cpu rmid of the moved CPUs first */
> for_each_cpu(cpu, &rdtgrp->cpu_mask)
> per_cpu(pqr_state.default_rmid, cpu) = prdtgrp->mon.rmid;
> @@ -3535,6 +3542,7 @@ static int rdtgroup_ctrl_remove(struct rdtgroup *rdtgrp)
>
> static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
> {
> + struct resctrl_driver *d;
> int cpu;
>
> /* Give any tasks back to the default group */
> @@ -3544,6 +3552,11 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
> cpumask_or(&rdtgroup_default.cpu_mask,
> &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);
>
> + list_for_each_entry(d, &drivers, list) {
> + if (d->rmdir)
> + d->rmdir(rdtgrp->closid, rdtgrp->mon.rmid, 0, 0);
> + }
> +
> /* Update per cpu closid and rmid of the moved CPUs first */
> for_each_cpu(cpu, &rdtgrp->cpu_mask) {
> per_cpu(pqr_state.default_closid, cpu) = rdtgroup_default.closid;

... is the expectation that the driver would look at the latter parameters to
determine if a monitor or control group is removed? I think that may be troublesome
since API like this relies on driver needing to have a lot of insight into
internal implementation choices of resctrl (which could potentially change?).
If I understand correctly the driver makes assumptions like: (a) default control group
always is assigned closid 0 and rmid 0 and (b) default control group is never removed.
I think that it should not be required for drivers to have knowledge like this
(and then also risk that these assumptions may change).

Reinette