Re: [PATCH 3/7] x86/intel_rdt: Add support for cache bit mask management

From: Vikas Shivappa
Date: Mon May 18 2015 - 15:19:23 EST




On Fri, 15 May 2015, Thomas Gleixner wrote:

On Mon, 11 May 2015, Vikas Shivappa wrote:
@@ -35,6 +36,42 @@ static struct rdt_subsys_info rdtss_info;
static DEFINE_MUTEX(rdt_group_mutex);
struct intel_rdt rdt_root_group;

+/*
+ * Mask of CPUs for writing CBM values. We only need one per-socket.

One mask? One CPU? One what? Please add comments which are
understandable and useful for people who did NOT write that code.

when there is already code in the upstream code which has comments exactly consistent - when that made sense , this should ?

/*
* Mask of CPUs for reading CQM values. We only need one per-socket.
*/
static cpumask_t cqm_cpumask;


+ */
+static cpumask_t rdt_cpumask;

+static void __cpu_cbm_update(void *info)
+{
+ unsigned int closid = *((unsigned int *)info);

So all you want is to transfer closid, right? Why having a pointer
instead of just doing

on_each_cpu_mask(...., (void *) id);

and
id = (unsigned long) id;

will fix.


Looks too simple and readable, right?

+/*
+ * cbm_update_msrs() - Updates all the existing IA32_L3_MASK_n MSRs
+ * which are one per CLOSid, except IA32_L3_MASK_0 on the current package.
+ * @cpu : the cpu on which the mask is updated.
+ */
+static inline void cbm_update_msrs(int cpu)
+{
+ int maxid = boot_cpu_data.x86_rdt_max_closid;
+ unsigned int i;
+
+ if (WARN_ON(cpu != smp_processor_id()))
+ return;
+
+ for (i = 1; i < maxid; i++) {

Lacks a comment why this starts with 1 and not with 0 as one would expect.

Its in the function comment just above - "except IA32_L3_MASK_0 on the current package."


+ if (ccmap[i].clos_refcnt)
+ __cpu_cbm_update(&i);
+ }
+}

This is only used by the cpu hotplug code. So why does it not live
close to it?

ok will fix

Because its more fun to search for functions than having
them just in place?

+/*
+ * intel_cache_alloc_cbm_write() - Validates and writes the
+ * cache bit mask(cbm) to the IA32_L3_MASK_n
+ * and also store the same in the ccmap.
+ *
+ * CLOSids are reused for cgroups which have same bitmask.
+ * - This helps to use the scant CLOSids optimally.
+ * - This also implies that at context switch write
+ * to PQR-MSR is done only when a task with a
+ * different bitmask is scheduled in.
+ */
+static int intel_cache_alloc_cbm_write(struct cgroup_subsys_state *css,

Can you please drop these intel_ prefixes? They provide no value and
just eat space.

well , i got ample comments which wanted me to me to specify intel as this feature is specific to intel.
the cache monitoring code has similar prefixes as well - is that pref specific then ? I just dont want people to come back and ask to make it clear that things are intel specific


+ struct cftype *cft, u64 cbmvalue)
+{
+ u32 max_cbm = boot_cpu_data.x86_rdt_max_cbm_len;
+ struct intel_rdt *ir = css_rdt(css);
+ unsigned long cache_mask, max_mask;
+ unsigned long *cbm_tmp;
+ unsigned int closid;
+ ssize_t err = 0;
+
+ if (ir == &rdt_root_group)
+ return -EPERM;
+ bitmap_set(&max_mask, 0, max_cbm);
+
+ /*
+ * Need global mutex as cbm write may allocate a closid.
+ */
+ mutex_lock(&rdt_group_mutex);
+ bitmap_and(&cache_mask, (unsigned long *)&cbmvalue, &max_mask, max_cbm);
+ cbm_tmp = &ccmap[ir->clos].cache_mask;
+
+ if (bitmap_equal(&cache_mask, cbm_tmp, MAX_CBM_LENGTH))
+ goto out;
+

Sigh, what's the point of this bitmap shuffling? Nothing as far as I
can tell, because you first set all bits of maxmask and then AND
cbmvalue.

Again , this contraditcs your earlier comment where you asked me to use the bit manipulation APIs to make it more *readable*. So do you want more readable now or is that not clear now ?

https://lkml.org/lkml/2014/11/19/1145

+ /* Reset the least significant bit.*/
+ i = var & (var - 1);
+
+ if (i != (var & (var << 1)))
+ return false;

was replaced with the

if (bitmap_weight(&var, maxcbm) < min_bitmask_len)
return false;

first_bit = find_next_bit(&var, maxcbm, 0);
zero_bit = find_next_zero_bit(&var, maxcbm, first_bit);

if (find_next_bit(&var, maxcbm, zero_bit) < maxcbm)
return false;

3 lines replaced with more to make it more readable ! so does this come back then ?


All you achieve is, that you truncate cbmvalue to max_cbm bits. So if
cbmvalue has any of the invalid bits set, you happily ignore that.

taskset ignores the extra bits. is that a problem here ?


This whole bitmap usage is just making the code obfuscated.

u64 max_mask = (1ULL << max_cbm) - 1;

if (cbm_value & ~max_mask)
return -EINVAL;

if (cbm_value == (u64)ccmap[ir->clos].cache_mask)
return 0;

No bitmap_set, bitmap_and nothing. Simple and readable, right?

The bitmap would only be necessary, IF max_cbm is > 64. But that's not
the case and nothing in your code would survive ma_ cbm > 64. And as I
said before CBM_LEN is 32bit for now and I serioulsy doubt, that it
can ever go over 64bit simply because that would not fit into the mask
MSRs.

So can you please get rid of that bitmap nonsense including the
dynamic allocation and use simple and readable code constructs?

Just for the record:

struct bla {
u64 cbm_mask;
};

versus:

struct bla {
unsigned long *cbm_mask;
};

plus

bla.cbm_mask = kzalloc(sizeof(u64));

Is just utter waste of data and text space.

+ /*
+ * At this point we are sure to change the cache_mask.Hence release the
+ * reference to the current CLOSid and try to get a reference for
+ * a different CLOSid.
+ */
+ __clos_put(ir->clos);

So you drop the reference, before you make sure that you can get a new one?

I was horrible , will add fix. thanks for pointing out .


+ if (cbm_search(cache_mask, &closid)) {
+ ir->clos = closid;
+ __clos_get(closid);
+ } else {
+ err = intel_rdt_alloc_closid(ir);
+ if (err)
+ goto out;

What happens if intel_rdt_alloc_closid() fails? ir->clos is unchanged,
but you have no reference anymore. The next call to this function or
anything else which calls __clos_put() will trigger the underflow
warning and the warning will be completely useless because the
wreckage happened before that.

+static inline bool intel_rdt_update_cpumask(int cpu)
+{
+ int phys_id = topology_physical_package_id(cpu);
+ struct cpumask *mask = &rdt_cpumask;
+ int i;
+
+ for_each_cpu(i, mask) {
+ if (phys_id == topology_physical_package_id(i))
+ return false;
+ }

You must be kidding.

the rapl and cqm use similar code. You want me to keep a seperate package mask for this code which not would be that frequent at all ?

for c-state transitions only idle_notifier gets called - so this is only when a new package is physically added. and we dont need anything for cache alloc for idle transitions. not really frequent ?

You know how many packages are possible. What's
wrong with having a bitmap of physical packages which have a CPU
handling it?

id = topology_physical_package_id(cpu);
if (!__test_and_set_bit(id, &package_map)) {
cpumask_set_cpu(cpu, &rdt_cpumask);
cbm_update_msrs(cpu);
}

Four lines of code without loops and hoops. And here is a bitmap the
proper data structure.

+static void intel_rdt_cpu_exit(unsigned int cpu)
+{
+ int phys_id = topology_physical_package_id(cpu);
+ int i;
+
+ mutex_lock(&rdt_group_mutex);
+ if (!cpumask_test_and_clear_cpu(cpu, &rdt_cpumask)) {
+ mutex_unlock(&rdt_group_mutex);
+ return;
+ }
+
+ for_each_online_cpu(i) {
+ if (i == cpu)
+ continue;
+
+ if (phys_id == topology_physical_package_id(i)) {
+ cpumask_set_cpu(i, &rdt_cpumask);
+ break;
+ }
+ }

There is a better way than iterating over all online cpus ....


+static struct cftype rdt_files[] = {
+ {
+ .name = "cache_mask",
+ .seq_show = intel_cache_alloc_cbm_read,
+ .write_u64 = intel_cache_alloc_cbm_write,
+ .mode = 0666,
+ },
+ { } /* terminate */
+};

Where is the Documentation for that file? What's the content, format,
error codes ....

Will add section to rdt.txt in Documentation/cgroups - also will move the documentation patch before all these patches..


Sigh.

tglx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/