Re: [PATCH v4 13/18] x86/intel_rdt: Add mkdir to resctrl file system

From: Fenghua Yu
Date: Mon Oct 17 2016 - 19:55:28 EST


On Mon, Oct 17, 2016 at 04:37:30PM -0700, Luck, Tony wrote:
> On Tue, Oct 18, 2016 at 01:20:36AM +0200, Thomas Gleixner wrote:
> > On Mon, 17 Oct 2016, Fenghua Yu wrote:
> > > part0: L3:0=1;1=1 closid0/cbm=1 on cache0 and closid0/cbm=1 on cache1
> > > (closid 15 on cache0 combined with 16 different closids on cache1)
> > > ...
> > > part254: L3:0=ffff;1=7fff closid15/cbm=ffff on cache0 and closid14/cbm=7fff on cache1
> > > part255: L3:0=ffff;1=ffff closid15/cbm=ffff on cache0 and closid15/cbm=ffff on cache1
> > >
> > > To utilize as much combinations as possbile, we may implement a
> > > more complex allocation than current one.
> > >
> > > Does this make sense?
> >
> > Thanks for the explanation. I knew that I'm missing something.
> >
> > But how is that supposed to work? The schemata files have no idea of
> > closids simply because the closids are assigned automatically. And that
> > makes the whole thing exponentially complex. You must allow to create ALL
> > rdt groups (initialy as a copy of the root group) and then when the
> > schemata file is written you have to look whether the particular CBM value
> > for a particular domain is already used and assign the same cosid for this
> > domain. That of course makes the whole L2 business completely diffuse
> > because you might end up with:
> >
> > Dom0 = COSID1 and DOM1 = COSID9
> >
> > So you can set the L2 for Dom0, but not for DOM1 and then if you set L2 for
> > Dom0 you must find a new COSID for Dom0. If there is none, then you must
> > reject the write and leave the admin puzzled.
> >
> > There is a reason why I suggested:
> >
> > https://lkml.kernel.org/r/alpine.DEB.2.11.1511181534450.3761@nanos
> >
> > It's certainly not perfect (missing L2 etc.), but clearly avoids exactly
> > the above issues. And it would allow you to utilize the 256 groups in an
> > understandable way.
>
> If you head down that path someone with a 4-socket system will try to
> make 16x16x16x16 = 65536 groups and "understandable" takes a bit of
> a beating. The eight socket system with 16^8 = 4G groups defies any
> rationale hope. Best not to think about 16 sockets.

The number of 16^L3 cache numbers is max partition number limitation
that a sysadmin can create in theory. Beyond the number, allocation
returns no space. It's kind of like other cases eg many many mkdir in one
directory can fail at one point because mkdir run out of disk space etc.

>
> The L2 + L3 configuration space gets unbelievably messy too.
>
> There's a reason why I ripped out the allocation code and went with
> a simple global allocator in this version. If we decide we need something
> fancier we can adapt later. Some solutions might be transparent to
> applications, others might add a "closid" file into each directory to
> give 2nd generation applications hooks to view (and maybe control)
> which closid is used by each group.

Fully agree with Tony. We understand the complexity of the situation and
just have a simple and working solution for the first version.

Thanks.

-Fenghua