Re: [PATCH v2 6/6] perf cpumap: Add range data encoding

From: Ian Rogers
Date: Wed Jun 29 2022 - 12:20:00 EST


On Wed, Jun 29, 2022 at 2:31 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
>
> On Tue, Jun 14, 2022 at 07:33:53AM -0700, Ian Rogers wrote:
>
> SNIP
>
> > -static size_t cpus_size(const struct perf_cpu_map *map)
> > +static void synthesize_range_cpus(struct synthesize_cpu_map_data *data)
> > {
> > - return sizeof(struct cpu_map_entries) + perf_cpu_map__nr(map) * sizeof(u16);
> > + data->data->type = PERF_CPU_MAP__RANGE_CPUS;
> > + data->data->range_cpu_data.any_cpu = data->has_any_cpu;
> > + data->data->range_cpu_data.start_cpu = data->min_cpu;
> > + data->data->range_cpu_data.end_cpu = data->max_cpu;
> > }
> >
> > -static size_t mask_size(const struct perf_cpu_map *map, int *max)
> > -{
> > - *max = perf_cpu_map__max(map).cpu;
> > - return sizeof(struct perf_record_mask_cpu_map32) + BITS_TO_U32(*max) * sizeof(__u32);
> > -}
> > -
> > -static void *cpu_map_data__alloc(const struct perf_cpu_map *map, size_t *size,
> > - u16 *type, int *max)
> > +static void *cpu_map_data__alloc(struct synthesize_cpu_map_data *syn_data,
> > + size_t header_size)
> > {
> > size_t size_cpus, size_mask;
> > - bool is_dummy = perf_cpu_map__empty(map);
> >
> > - /*
> > - * Both array and mask data have variable size based
> > - * on the number of cpus and their actual values.
> > - * The size of the 'struct perf_record_cpu_map_data' is:
> > - *
> > - * array = size of 'struct cpu_map_entries' +
> > - * number of cpus * sizeof(u64)
> > - *
> > - * mask = size of 'struct perf_record_record_cpu_map' +
> > - * maximum cpu bit converted to size of longs
> > - *
> > - * and finally + the size of 'struct perf_record_cpu_map_data'.
> > - */
> > - size_cpus = cpus_size(map);
> > - size_mask = mask_size(map, max);
> > + syn_data->nr = perf_cpu_map__nr(syn_data->map);
> > + syn_data->has_any_cpu = (perf_cpu_map__cpu(syn_data->map, 0).cpu == -1) ? 1 : 0;
>
> I'm bit lost in the logic in here.. should it be '.cpu != -1' ?
> has_any_cpu is named as bool but used as index below ;-)
>
> could you please keep/update the comment above and explain
> the conditions when each cpu_map type is taken

So this relates to the CPU map "empty" problem that I've been moaning
about for a time with Adrian, Arnaldo, etc. The CPU map can contain -1
for the "any CPU" case in perf_event_open, it can also contain CPU
values after this as the result of a merge. Having only -1 in the CPU
map is referred to as a dummy (presumably because the dummy event uses
it), what does it mean if you have the -1 and CPU values? Things get
really messy when you look at what the definition of empty is. Here
I've used the term from perf_event_open that -1 means this can be on
any CPU. As the CPU map data is sorted we can take advantage that the
"any CPU" value always comes first. If we have it we set the bit here.
We also use the bit below so that we ignore the first array element if
it has -1 in it. As the -1 adjusts the CPU map size, there are some
other adjustments as well. Hopefully this makes things clearer and I
want to rewrite parts of the CPU map API to make them clearer too -
empty should mean == NULL or nr == 0, dummy is more confusing to me
than "any CPU", etc.

Thanks,
Ian

> thanks,
> jirka
>
> >
> > - if (is_dummy || (size_cpus < size_mask)) {
> > - *size += size_cpus;
> > - *type = PERF_CPU_MAP__CPUS;
> > - } else {
> > - *size += size_mask;
> > - *type = PERF_CPU_MAP__MASK;
> > + syn_data->min_cpu = perf_cpu_map__cpu(syn_data->map, syn_data->has_any_cpu).cpu;
> > + syn_data->max_cpu = perf_cpu_map__max(syn_data->map).cpu;
> > + if (syn_data->max_cpu - syn_data->min_cpu + 1 == syn_data->nr - syn_data->has_any_cpu) {
> > + /* A consecutive range of CPUs can be encoded using a range. */
> > + assert(sizeof(u16) + sizeof(struct perf_record_range_cpu_map) == sizeof(u64));
> > + syn_data->type = PERF_CPU_MAP__RANGE_CPUS;
> > + syn_data->size = header_size + sizeof(u64);
> > + return zalloc(syn_data->size);
> > }
> >
> > - *size += sizeof(__u16); /* For perf_record_cpu_map_data.type. */
> > - *size = PERF_ALIGN(*size, sizeof(u64));
> > - return zalloc(*size);
> > + size_cpus = sizeof(u16) + sizeof(struct cpu_map_entries) + syn_data->nr * sizeof(u16);
> > + /* Due to padding, the 4bytes per entry mask variant is always smaller. */
> > + size_mask = sizeof(u16) + sizeof(struct perf_record_mask_cpu_map32) +
> > + BITS_TO_U32(syn_data->max_cpu) * sizeof(__u32);
> > + if (syn_data->has_any_cpu || size_cpus < size_mask) {
> > + /* Follow the CPU map encoding. */
> > + syn_data->type = PERF_CPU_MAP__CPUS;
> > + syn_data->size = header_size + PERF_ALIGN(size_cpus, sizeof(u64));
> > + return zalloc(syn_data->size);
> > + }
> > + /* Encode using a bitmask. */
> > + syn_data->type = PERF_CPU_MAP__MASK;
> > + syn_data->size = header_size + PERF_ALIGN(size_mask, sizeof(u64));
> > + return zalloc(syn_data->size);
>
> SNIP