Re: [PATCH v5 1/1] x86/resctrl: Fix task CLOSID/RMID update race

From: Reinette Chatre
Date: Fri Dec 16 2022 - 14:36:34 EST


Hi Peter,

On 12/16/2022 2:26 AM, Peter Newman wrote:
> Hi Reinette,
>
> On Fri, Dec 16, 2022 at 12:52 AM Reinette Chatre
> <reinette.chatre@xxxxxxxxx> wrote:
>>
>> For a fix a Fixes: tag is expected. It looks like the following
>> may be relevant:
>> Fixes: ae28d1aae48a ("x86/resctrl: Use an IPI instead of task_work_add() to update PQR_ASSOC MSR")
>> Fixes: 0efc89be9471 ("x86/intel_rdt: Update task closid immediately on CPU in rmdir and unmount")
>
> Thanks for preparing these lines. I'll include them.
>
>>
>>> Signed-off-by: Peter Newman <peternewman@xxxxxxxxxx>
>>
>> Also, please do let the stable team know about this via:
>> Cc: stable@xxxxxxxxxxxxxxx
>
> I wasn't sure if this fix met the criteria for backporting to stable,
> because I found it by code inspection, so it doesn't meet the "bothers
> people" criterion.

That is fair. Encountering the issue does not have an obvious error, the
consequence is that there could be intervals during which tasks may not
get resources/measurements they are entitled to. I do think that this will
be hard to test in order to demonstrate the impact.

My understanding was that this was encountered in your environment where
actions are taken at large scale. If this remains theoretical then no need
to include the stable team. With the Fixes tags they can decide if it is
something they would like to carry.

>
> However I can make a case that it's exploitable:
>
> "In a memory bandwidth-metered compute host, malicious jobs could
> exploit this race to remain in a previous CLOSID or RMID in order to
> dodge a class-of-service downgrade imposed by an admin or steal
> bandwidth."
>

I am not comfortable with such high level speculation. For this
exploit to work the malicious jobs needs to control scheduler decisions
as well as time the exploit with the admin's decision to move the target task.


Reinette