Re: [PATCH v3 1/8] media: hantro: Trace hevc hw cycles performance register

From: Benjamin Gaignard
Date: Tue Jun 22 2021 - 08:23:32 EST



Le 18/06/2021 à 20:58, Ezequiel Garcia a écrit :
Hi Benjamin,

On Fri, 2021-06-18 at 15:15 +0200, Benjamin Gaignard wrote:
After each hevc decoded frame trace the hardware performance.
It provides the number of hw cycles spend per decoded macroblock.

Please add some documentation about how these are supposed
to be used. It will be easier to discuss after seeing
things in actiion.

A good place for the documentation would be:

https://www.kernel.org/doc/html/latest/admin-guide/media/v4l-drivers.html

Ok that will be in v4


[..]
@@ -22,6 +23,21 @@ static inline void hantro_write_addr(struct hantro_dev *vpu,
        vdpu_write(vpu, addr & 0xffffffff, offset);
 }
+void hantro_g2_hevc_dec_done(struct hantro_ctx *ctx)
I'm worried about the runtime cost this would have.

I see other drivers (i915, panfrost) seem to have an ioctl
to enable the perf counters.

Perhaps we don't need an ioctl, but a module param would be enough
for now.

You can be reassured the overhead of traces points is very low has explained
in this article: https://lwn.net/Articles/330402/


+{
+       const struct hantro_hevc_dec_ctrls *ctrls = &ctx->hevc_dec.ctrls;
+       const struct v4l2_ctrl_hevc_sps *sps = ctrls->sps;
+       struct hantro_dev *vpu = ctx->dev;
+       u32 hw_cycles = 0;
+       u32 mbs = (sps->pic_width_in_luma_samples *
+                  sps->pic_height_in_luma_samples) >> 8;
+
+       if (mbs)
+               hw_cycles = vdpu_read(vpu, G2_HW_PERFORMANCE) / mbs;
+
+       trace_hantro_hevc_perf(ctx, hw_cycles);
+}
+

[..]
+
+TRACE_EVENT(hantro_hevc_perf,
+       TP_PROTO(struct hantro_ctx *ctx, u32 hw_cycles),
+
+       TP_ARGS(ctx, hw_cycles),
+
+       TP_STRUCT__entry(
+               __field(int, minor)
+               __field(u32, hw_cycles)
+       ),
+
+       TP_fast_assign(
+               __entry->minor = ctx->fh.vdev->minor;
Tracking performance per minor doesn't seem useful,
we'd like to track per-fd (i.e. per context).

This part of the driver doesn't know for which fd the decoding job is done
so impossible to add it there.

Regards,
Benjamin


Thanks,
Ezequiel