Re: [PATCH 2/2] x86/intel_rdt: Update task closid immediately on CPU in rmdir and unmount

From: Fenghua Yu
Date: Fri Nov 25 2016 - 21:37:57 EST


On Wed, Nov 23, 2016 at 03:23:50PM +0100, Thomas Gleixner wrote:
> On Fri, 18 Nov 2016, Fenghua Yu wrote:
> Reworked untested patch below.

The reworked patch passes my baisc tests. But I have a quesiton on
rdt_move_group_tasks() (please see below).

>
> Thanks,
>
> tglx
>
> 8<-----------------
> --- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
> +++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
> @@ -194,12 +194,13 @@ static int rdtgroup_cpus_show(struct ker
> /*
> * This is safe against intel_rdt_sched_in() called from __switch_to()
> * because __switch_to() is executed with interrupts disabled. A local call
> - * from rdt_update_percpu_closid() is proteced against __switch_to() because
> + * from rdt_update_closid() is proteced against __switch_to() because
> * preemption is disabled.
> */
> -static void rdt_update_cpu_closid(void *v)
> +static void rdt_update_cpu_closid(void *closid)
> {
> - this_cpu_write(cpu_closid, *(int *)v);
> + if (closid)
> + this_cpu_write(cpu_closid, *(int *)closid);
> /*
> * We cannot unconditionally write the MSR because the current
> * executing task might have its own closid selected. Just reuse
> @@ -208,14 +209,23 @@ static void rdt_update_cpu_closid(void *
> intel_rdt_sched_in();
> }
>
> -/* Update the per cpu closid and eventually the PGR_ASSOC MSR */
> -static void rdt_update_percpu_closid(const struct cpumask *cpu_mask, int closid)
> +/*
> + * Update the PGR_ASSOC MSR on all cpus in @cpu_mask,
> + *
> + * Per task closids must have been set up before calling this function.
> + *
> + * The per cpu closids are updated with the smp function call, when @closid
> + * is not NULL. If @closid is NULL then all affected percpu closids must
> + * have been set up before calling this function.
> + */
> +static void
> +rdt_update_closid(const struct cpumask *cpu_mask, int *closid)
> {
> int cpu = get_cpu();
>
> if (cpumask_test_cpu(cpu, cpu_mask))
> - rdt_update_cpu_closid(&closid);
> - smp_call_function_many(cpu_mask, rdt_update_cpu_closid, &closid, 1);
> + rdt_update_cpu_closid(closid);
> + smp_call_function_many(cpu_mask, rdt_update_cpu_closid, closid, 1);
> put_cpu();
> }
>
> @@ -264,7 +274,7 @@ static ssize_t rdtgroup_cpus_write(struc
> /* Give any dropped cpus to rdtgroup_default */
> cpumask_or(&rdtgroup_default.cpu_mask,
> &rdtgroup_default.cpu_mask, tmpmask);
> - rdt_update_percpu_closid(tmpmask, rdtgroup_default.closid);
> + rdt_update_closid(tmpmask, &rdtgroup_default.closid);
> }
>
> /*
> @@ -278,7 +288,7 @@ static ssize_t rdtgroup_cpus_write(struc
> continue;
> cpumask_andnot(&r->cpu_mask, &r->cpu_mask, tmpmask);
> }
> - rdt_update_percpu_closid(tmpmask, rdtgrp->closid);
> + rdt_update_closid(tmpmask, &rdtgrp->closid);
> }
>
> /* Done pushing/pulling - update this group with new mask */
> @@ -807,18 +817,49 @@ static int reset_all_cbms(struct rdt_res
> }
>
> /*
> - * Forcibly remove all of subdirectories under root.
> + * Move tasks from one to the other group. If @from is NULL, then all tasks
> + * in the systems are moved unconditionally (used for teardown).
> + *
> + * If @mask is not NULL the cpus on which moved tasks are running are set
> + * in that mask so the update smp function call is restricted to affected
> + * cpus.
> */
> -static void rmdir_all_sub(void)
> +static void rdt_move_group_tasks(struct rdtgroup *from, struct rdtgroup *to,
> + struct cpumask *mask)
> {
> - struct rdtgroup *rdtgrp, *tmp;
> struct task_struct *p, *t;
>
> - /* move all tasks to default resource group */
> read_lock(&tasklist_lock);
> - for_each_process_thread(p, t)
> - t->closid = 0;
> + for_each_process_thread(p, t) {
> + if (!from || t->closid == from->closid) {
> + t->closid = to->closid;
> +#ifdef CONFIG_SMP
> + /*
> + * This is safe on x86 w/o barriers as the ordering
> + * of writing to task_cpu() and t->on_cpu is
> + * reverse to the reading here. The detection is
> + * inaccurate as tasks might move or schedule
> + * before the smp function call takes place. In
> + * such a case the function call is pointless, but
> + * there is no other side effect.
> + */

If process p1 is running on CPU1 before this point,

> + if (mask && t->on_cpu)
> + cpumask_set_cpu(task_cpu(t), mask);

If between CPU1 is set in mask and rdt_update_closid(tmpmask, NULL) is
called, p1 is switched to CPU2, and process p2 with its own closid
(e.g. 2) is switched to CPU1.

Then closid in PQR_ASSOC is set incorrectly as 0 instead of 2 on CPU1.
0 may stay in PQR_ASSOC until next context switch which may take long time
in cases of real time or HPC.

Don't we need to care this situation? In this situation, the function call
is not "pointless" but it's wrong, right?

Thanks.

-Fenghua