Re: [PATCH 0/3] x86/resctrl: Fix a few issues in moving a task to a resource group

From: Reinette Chatre
Date: Mon Dec 14 2020 - 13:40:26 EST

Next message: Greg Kroah-Hartman: "[PATCH 5.9 071/105] platform/x86: touchscreen_dmi: Add info for the Predia Basic tablet"
Previous message: Greg Kroah-Hartman: "[PATCH 5.9 070/105] platform/x86: intel-vbtn: Support for tablet mode on HP Pavilion 13 x360 PC"
In reply to: Valentin Schneider: "Re: [PATCH 0/3] x86/resctrl: Fix a few issues in moving a task to a resource group"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Valentin,

On 12/11/2020 12:46 PM, Valentin Schneider wrote:

On 03/12/20 23:25, Reinette Chatre wrote:

Valentin's series in [2] ends by adding memory barriers to support the
updating of the task_struct from one CPU and the usage of the task_struct data
from another CPU. This work is still needed and as discussed with Valentin in
that thread the work would be re-evaluated by him after seeing how this series
turns out.

Thank you very much for taking a look.

So the "problematic" pattern is still there: a context switch can happen
concurrently with a write to the switching-to-tasks's {closid, rmid}.
Accesses to these fields would thus need to be wrapped by READ_ONCE() &
WRITE_ONCE().

ok.

Thinking a bit more (too much?) about it, we could limit ourselves to
wrapping only reads not protected by the rdtgroup_mutex: the only two
task_struct {closid, rmid} writers are
- rdtgroup_move_task()
- rdt_move_group_tasks()
and they are both invoked while holding said mutex. Thus, a reader holding
the mutex cannot race with a write, so load tearing ought to be safe.

The reads that are not protected by the rdtgroup_mutex can be found in __resctrl_sched_in(). It thus sounds to me like your proposed changes to this function found in your patch [1] is what is needed? It is not clear to me how the pairing would work in this case though. If I understand correctly the goal is for the write to the closid/rmid in the functions you mention above to be paired with the reads in resctrl_sched_in() and it is not clear how adding a single READ_ONCE would accomplish this pairing by itself.

It is also not entirely clear to me what the problematic scenario could be. If I understand correctly, the risk is (as you explained in your commit message), that a CPU could have its {closid, rmid} fields read locally (resctrl_sched_in()) while they are concurrently being written to from another CPU (in rdtgroup_move_task() and rdt_move_group_tasks() as you state above). If this happens then a task being moved may be scheduled in with its old closid/rmid. The update of closid/rmid in rdtgroup_move_task()/rdt_move_group_tasks() is followed by smp_call_function_xx() where the registers are updated with preemption disabled and thus protected against __switch_to. If a task was thus incorrectly scheduled in with old closid/rmid, would it not be corrected at this point?

Thank you

Reinette

[1] https://lore.kernel.org/lkml/20201123022433.17905-4-valentin.schneider@xxxxxxx/

Next message: Greg Kroah-Hartman: "[PATCH 5.9 071/105] platform/x86: touchscreen_dmi: Add info for the Predia Basic tablet"
Previous message: Greg Kroah-Hartman: "[PATCH 5.9 070/105] platform/x86: intel-vbtn: Support for tablet mode on HP Pavilion 13 x360 PC"
In reply to: Valentin Schneider: "Re: [PATCH 0/3] x86/resctrl: Fix a few issues in moving a task to a resource group"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]