Re: [PATCH v5 1/1] x86/resctrl: Fix task CLOSID/RMID update race

From: Peter Newman
Date: Mon Dec 19 2022 - 05:22:21 EST


Hi Reinette,

On Fri, Dec 16, 2022 at 8:36 PM Reinette Chatre
<reinette.chatre@xxxxxxxxx> wrote:
> On 12/16/2022 2:26 AM, Peter Newman wrote:
> > However I can make a case that it's exploitable:
> >
> > "In a memory bandwidth-metered compute host, malicious jobs could
> > exploit this race to remain in a previous CLOSID or RMID in order to
> > dodge a class-of-service downgrade imposed by an admin or steal
> > bandwidth."
> >
>
> I am not comfortable with such high level speculation. For this
> exploit to work the malicious jobs needs to control scheduler decisions
> as well as time the exploit with the admin's decision to move the target task.

I imagined if the malicious job maintained a large pool of threads in
short sleep-loops, after it sees a drop in bandwidth, it can cue the
threads to measure their memory bandwidth to see if any got past the
CLOSID change.

I don't know whether having fast, unmetered bandwidth until the next
context switch is enough of a payoff to bother with this, though. Our
workloads have too many context switches for this to be worth very much,
so I'm fine with letting others decide how important this fix is to
them.

-Peter