Re: [PATCH v4 4/7] powerpc/powernv: detect supported nest pmus and its events

From: Madhavan Srinivasan
Date: Thu Jul 09 2015 - 04:03:44 EST




On Thursday 09 July 2015 03:31 AM, Sukadev Bhattiprolu wrote:
> Madhavan Srinivasan [maddy@xxxxxxxxxxxxxxxxxx] wrote:
> | Parse device tree to detect supported nest pmu units. Traverse
> | through each nest pmu unit folder to find supported events and
> | corresponding unit/scale files (if any).
> |
> | The nest unit event file from DT, will contain the offset in the
> | reserved memory region to get the counter data for a given event.
> | Kernel code uses this offset as event configuration value.
> |
> | Device tree parser code also looks for scale/unit in the file name and
> | passes on the file as an event attr for perf tool to use in the post
> | processing.
> |
> | Cc: Michael Ellerman <mpe@xxxxxxxxxxxxxx>
> | Cc: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>
> | Cc: Paul Mackerras <paulus@xxxxxxxxx>
> | Cc: Anton Blanchard <anton@xxxxxxxxx>
> | Cc: Sukadev Bhattiprolu <sukadev@xxxxxxxxxxxxxxxxxx>
> | Cc: Anshuman Khandual <khandual@xxxxxxxxxxxxxxxxxx>
> | Cc: Stephane Eranian <eranian@xxxxxxxxxx>
> | Signed-off-by: Madhavan Srinivasan <maddy@xxxxxxxxxxxxxxxxxx>
> | ---
> | arch/powerpc/perf/nest-pmu.c | 124 ++++++++++++++++++++++++++++++++++++++++++-
> | 1 file changed, 123 insertions(+), 1 deletion(-)
> |
> | diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
> | index e7d45ed..6116ff3 100644
> | --- a/arch/powerpc/perf/nest-pmu.c
> | +++ b/arch/powerpc/perf/nest-pmu.c
> | @@ -11,6 +11,119 @@
> | #include "nest-pmu.h"
> |
> | static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
> | +static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
> | +
> | +static int nest_event_info(struct property *pp, char *start,
>
> nit: s/start/name/?

Yes. Will change it.

>
> | + struct nest_ima_events *p8_events, int flg, u32 val)
>
> nit: s/flg/string/?

Yes. Will change it.

> | +{
> | + char *buf;
> | +
> | + /* memory for event name */
> | + buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
> | + if (!buf)
> | + return -ENOMEM;
> | +
> | + strncpy(buf, start, strlen(start));
> | + p8_events->ev_name = buf;
> | +
> | + /* memory for content */
> | + buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
> | + if (!buf)
> | + return -ENOMEM;
> | +
> | + if (flg) {
> | + /* string content*/
> | + if (!pp->value ||
> | + (strnlen(pp->value, pp->length) == pp->length))
> | + return -EINVAL;
> | +
> | + strncpy(buf, (const char *)pp->value, pp->length);
> | + } else
> | + sprintf(buf, "event=0x%x", val);
> | +
> | + p8_events->ev_value = buf;
> | + return 0;
> | +}
> | +
> | +static int nest_pmu_create(struct device_node *dev, int pmu_index)
> | +{
> | + struct nest_ima_events **p8_events_arr, *p8_events;
> | + struct nest_pmu *pmu_ptr;
> | + struct property *pp;
> | + char *buf, *start;
> | + const __be32 *lval;
> | + u32 val;
> | + int idx = 0, ret;
> | +
> | + if (!dev)
> | + return -EINVAL;
> | +
> | + /* memory for nest pmus */
> | + pmu_ptr = kzalloc(sizeof(struct nest_pmu), GFP_KERNEL);
> | + if (!pmu_ptr)
> | + return -ENOMEM;
> | +
> | + /* Needed for hotplug/migration */
> | + per_nest_pmu_arr[pmu_index] = pmu_ptr;
> | +
> | + /* memory for nest pmu events */
> | + p8_events_arr = kzalloc((sizeof(struct nest_ima_events) * 64),
> | + GFP_KERNEL);
> | + if (!p8_events_arr)
> | + return -ENOMEM;
> | + p8_events = (struct nest_ima_events *)p8_events_arr;
> | +
> | + /*
> | + * Loop through each property
> | + */
> | + for_each_property_of_node(dev, pp) {
> | + start = pp->name;
> | +
> | + if (!strcmp(pp->name, "name")) {
> | + if (!pp->value ||
> | + (strnlen(pp->value, pp->length) == pp->length))
> | + return -EINVAL;
>
> Do we need to check the string length here? If so, should we check against
> size we are going to allocate below (P8_NEST_MAX_PMU_NAME_LEN)? Or is it
> possible pp->value is not NULL terminated?

Yes we need and can add a check for size is below the
P8_NEST_MAX_PMU_NAME_LEN .

> | +
> | + buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
> | + if (!buf)
> | + return -ENOMEM;
> | +
> | + /* Save the name to register it later */
> | + sprintf(buf, "Nest_%s", (char *)pp->value);
> | + pmu_ptr->pmu.name = (char *)buf;
> | + continue;
> | + }
> | +
> | + /* Skip these, we dont need it */
> | + if (!strcmp(pp->name, "phandle") ||
> | + !strcmp(pp->name, "device_type") ||
> | + !strcmp(pp->name, "linux,phandle"))
> | + continue;
> | +
> | + if (strncmp(pp->name, "unit.", 5) == 0) {
> | + /* Skip first few chars in the name */
> | + start += 5;
> | + ret = nest_event_info(pp, start, p8_events++, 1, 0);
> | + } else if (strncmp(pp->name, "scale.", 6) == 0) {
> | + /* Skip first few chars in the name */
> | + start += 6;
> | + ret = nest_event_info(pp, start, p8_events++, 1, 0);
>
> Are the 'start.*' and 'unit.*' files events by themselves or just attributes
> of events?

These are attributes needed for computation. unit and scale attributes
will be used by perf tool in post-processing the counter data. These
can also use by other tools like pcp.

> These will show up in sysfs as:
>
> $ ls /sys/bus/event_source/devices/Nest_Alink_BW//events/
> Alink0 Alink0.unit Alink1.scale Alink2 Alink2.unit
> Alink0.scale Alink1 Alink1.unit Alink2.scale

Yes true.

> From the other PMUs, the 'events' directory contains just events.

For ex from x86 sandybridge system:

# ls
cas_count_read cas_count_read.unit cas_count_write.scale clockticks
cas_count_read.scale cas_count_write cas_count_write.unit

> Attributes of events, like descriptions go into a separate directory:
> Eg: For 24x7 counters:
>
> $ ls -d /sys/bus/event_source/devices/hv_24x7/event*
> /sys/bus/event_source/devices/hv_24x7/event_descs
> /sys/bus/event_source/devices/hv_24x7/event_long_descs
> /sys/bus/event_source/devices/hv_24x7/events

Yes. But these attributes are not informational like event description.

> If events have/need parameters, they go into the event file itself:
> Eg on an x86 box:
>
> $ cat /sys/bus/event_source/devices/cpu/events/cache-misses
> event=0x2e,umask=0x41
>
> For the nest events, are the 'units' and 'scale' files needed to
> identify the event (like the umask in x86 example above) or are they
> informational attributes (like descriptions for 24x7 counters)?

Yes. Again, these are not event parameter value to add in
the event configuration value or informational.

> IOW, following works on my x86 system (and with 24x7 counters):
>
> for i in `ls /sys/bus/event_source/devices/cpu/events/`; do
> perf stat -e cpu/$i/ sleep 1;
> done
>
> This loop will fail for Nest events, when it hits files like Alink0.unit.
>
> That said, I am not sure if the above for loop is supposed to work always!
> Maybe Peter Zijlstra can comment on that.

Yes.

> Are the units and scales needed/used by perf in computations? If just
> informational and, given that we can locate them from the device-tree,
> can we drop them altogether?
>
> If they are needed by perf and are attributes of an event, can we
> move them to separate directories?

I could prefer not to and here is my reason. Today perf tool
already have a mechanism to get the unit and scale values from
kernel, why to add one more and add code to perf tool to support it?

Thanks for review

Maddy


> /sys/bus/event_source/devices/Nest_Alink_BW/events
> /sys/bus/event_source/devices/Nest_Alink_BW/units
> /sys/bus/event_source/devices/Nest_Alink_BW/scales
>
> Sukadev
>
>
>
> | + } else {
> | + lval = of_get_property(dev, pp->name, NULL);
> | + val = (uint32_t)be32_to_cpup(lval);
> | +
> | + ret = nest_event_info(pp, start, p8_events++, 0, val);
> | + }
> | +
> | + if (ret)
> | + return ret;
> | +
> | + /* book keeping */
> | + idx++;
>
> I don't see idx being used after the increment?

Used in next patch.

> | + }
> | +
> | + return 0;
> | +}
> |
> | static int nest_ima_dt_parser(void)
> | {
> | @@ -19,7 +132,7 @@ static int nest_ima_dt_parser(void)
> | const __be64 *chip_ima_size;
> | struct device_node *dev;
> | struct perchip_nest_info *p8ni;
> | - int idx;
> | + int idx, ret;
> |
> | /*
> | * "nest-ima" folder contains two things,
> | @@ -50,6 +163,15 @@ static int nest_ima_dt_parser(void)
> | p8ni->vbase = (uint64_t) phys_to_virt(p8ni->pbase);
> | }
> |
> | + /* Look for supported Nest PMU units */
> | + idx = 0;
> | + for_each_node_by_type(dev, "nest-ima-unit") {
> | + ret = nest_pmu_create(dev, idx);
> | + if (ret)
> | + return ret;
> | + idx++;
>
> idx not used?

used by code in patch 6 of this series.

Thanks
Maddy

> | + }
> | +
> | return 0;
> | }
> |
> | --
> | 1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/