Re: [PATCH RFC] ioctl based CAT interface

From: Marcelo Tosatti
Date: Mon Nov 16 2015 - 20:02:20 EST


On Mon, Nov 16, 2015 at 02:39:03PM -0200, Marcelo Tosatti wrote:
> On Mon, Nov 16, 2015 at 10:07:56AM +0100, Peter Zijlstra wrote:
> > On Fri, Nov 13, 2015 at 03:33:04PM -0200, Marcelo Tosatti wrote:
> > > On Fri, Nov 13, 2015 at 05:51:00PM +0100, Peter Zijlstra wrote:
> > > > On Fri, Nov 13, 2015 at 02:39:33PM -0200, Marcelo Tosatti wrote:
> > > > > + * * one tcrid entry can be in different locations
> > > > > + * in different sockets.
> > > >
> > > > NAK on that without cpuset integration.
> > > >
> > > > I do not want freely migratable tasks having radically different
> > > > performance profiles depending on which CPU they land.
> > >
> > > Ok, so, configuration:
> > >
> > >
> > > Socket-1 Socket-2
> > >
> > > pinned thread-A with 100% L3 free
> > > 80% of L3
> > > reserved
> > >
> > >
> > > So it is a problem if a thread running on socket-2 is scheduled to
> > > socket-1 because performance is radically different, fine.
> > >
> > > Then one way to avoid that is to not allow freely migratable tasks
> > > to move to Socket-1. Fine.
> > >
> > > Then you want to use cpusets for that.
> > >
> > > Can you fill in the blanks what is missing here?
> >
> > I'm still not seeing what the problem with CAT-cgroup is.
> >
> > /cgroups/cpuset/
> > socket-1/cpus = $socket-1
> > tasks = $thread-A
> >
> > socket-2/cpus = $socket-2
> > tasks = $thread-B
> >
> > /cgroups/cat/
> > group-A/bitmap = 0x3F / 0xFF
> > group-A/tasks = $thread-A
> >
> > group-B/bitmap = 0xFF / 0xFF
> > group-B/tasks = $thread-B
> >
> >
> > That gets you thread-A on socket-1 with 6/8 of the L3 and thread-B on
> > socket-2 with 8/8 of the L3.
>
> Going that route, might as well expose the region shared with HW
> to userspace and let userspace handle the problem of contiguous free regions,
> which means the cgroups bitmask maps one-to-one to HW bitmap.
>
> All is necessary then is to modify the Intel patches to
>
> 1) Support bitmaps per socket.

Consider the following scenario, one server with two sockets:

socket-1 socket-2
[ [***] ] [ [***] ]
L3 cache bitmap L3 cache bitmap

[*] refers to the region shared with HW, as reported by CPUID (read the
Intel documentation).

socket-1.shared_region_with_hw = [bit 2, bit 5]
socket-2.shared_region_with_hw = [bit 16, bit 18]

Given that your application is critical, you do not want it to share any
reservation with HW. I was informed that there is no guarantee these
regions end up in the same location for different sockets. Lets say you
need 15 bits of reservation, and the total is 20 bits. One possibility would be:

socket-1.reservation = [bit 5, bit 15]
socket-2.reservation = [bit 1, bit 15]

For the current Intel CAT patchset, this restriction exists:

static int cbm_validate_rdt_cgroup(struct intel_rdt *ir, unsigned long
cbmvalue)
{
struct cgroup_subsys_state *css;
struct intel_rdt *par, *c;
unsigned long cbm_tmp = 0;
int err = 0;

if (!cbm_validate(cbmvalue)) {
err = -EINVAL;
goto out_err;
}

par = parent_rdt(ir);
clos_cbm_table_read(par->closid, &cbm_tmp);
if (!bitmap_subset(&cbmvalue, &cbm_tmp, MAX_CBM_LENGTH)) {
err = -EINVAL;
goto out_err;
}


Do you (or the author of the patch), can explain why is this
restriction here?

If the restriction has to be maintained, than one
hierarchy per-socket will be necessary to support different
bitmaps per socket.

If the restriction can be removed, then non hierarchical support
could look like:

/cgroups/cat/group-A/tasks = $thread-A
/cgroups/cat/group-A/socket-1/bitmap = 0x3F / 0xFF
/cgroups/cat/group-A/socket-2/bitmap = 0x... / 0xFF

Or one l3_cbm file containing one mask per socket,
separated by commas, similar to

/sys/devices/system/node/node0/cpumap

> 2) Remove hierarchical support.

There is nothing hierarchical in CAT, its flat.

Each set of tasks is associated with a number of bits
in each socket's L3 CBM mask.

> 3) Lazy enforcement (which can be done later as an improvement).
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/