Re: [PATCH v2] perf/hx_arm_ni: Support uncore ARM NI-700 PMU

From: Yang Jialong 杨佳龙
Date: Wed Jan 31 2024 - 05:09:24 EST




在 2024/1/31 17:38, Krzysztof Kozlowski 写道:
On 31/01/2024 10:07, Yang Jialong 杨佳龙 wrote:


在 2024/1/31 15:59, Krzysztof Kozlowski 写道:
On 31/01/2024 08:08, JiaLong.Yang wrote:
This code is based on uncore PMUs arm_smmuv3_pmu and arm-cmn.
One ni-700 can have many clock domains. Each of them has only one PMU.
Here one PMU corresponds to one 'struct ni_pmu' instance.
PMU name will be ni_pmu_N_M, which N means different NI-700s and M means
different PMU in one NI-700. If only one NI-700 found in NI-700, name will
be ni_pmu_N.
Node interface event name will be xxni_N_eventname, such as
asni_0_rdreq_any. There are many kinds of type of nodes in one clock
domain. Also means that there are many kinds of that in one PMU. So we
distinguish them by xxni string. Besides, maybe there are many nodes
have same type. So we have number N in event name.
By ni_pmu_0_0/asni_0_rdreq_any/, we can pinpoint accurate bus traffic.
Example1: perf stat -a -e ni_pmu_0_0/asni_0_rdreq_any/,ni_pmu_0_0/cycles/
EXample2: perf stat -a -e ni_pmu_0_0/asni,id=0,event=0x0/

Signed-off-by: JiaLong.Yang <jialong.yang@xxxxxxxxxxxx>
---
v1 --> v2:
1. Submit MAINTANER Documentation/ files seperately.

SEPARATE PATCHES, not patchsets. You have now checkpatch warnings
because of this...

...OK. But the MAINTANER file changing should be given in which one
patches.
I will submit patch v3 after talking and your permission.


2. Delete some useless info printing.
3. Change print from pr_xxx to dev_xxx.
4. Fix more than 75 length log info.
5. Fix dts attribute pccs-id.
6. Fix generic name according to DT specification.
7. Some indentation.
8. Del of_match_ptr macro.

drivers/perf/Kconfig | 11 +
drivers/perf/Makefile | 1 +
drivers/perf/hx_arm_ni.c | 1284 ++++++++++++++++++++++++++++++++++++++
3 files changed, 1296 insertions(+)
create mode 100644 drivers/perf/hx_arm_ni.c

diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
index ec6e0d9194a1..95ef8b13730f 100644
--- a/drivers/perf/Kconfig
+++ b/drivers/perf/Kconfig
@@ -241,4 +241,15 @@ config CXL_PMU
If unsure say 'm'.
+config HX_ARM_NI_PMU
+ tristate "HX ARM NI-700 PMU"
+ depends on PPC_HX_C2000 && 64BIT

1. There is no PPC_HX_C2000.

I have been used to using this macro. However this macro is not existed
in mainline.
I will replace it with ARM64. And del involved C code if OK.

64bit:
__ffs(unsigned long) and __fls(unsigned long) will be wrong in 32bit. I
pass a u64 argument.

One thing is where the code is supposed to run, second thing is compile
testing.


Now run on my company product, a 64bit PowerPC...
But I think it's general for 64bit systems.

Why do you use __ffs, not __ffs64 which takes u64 if you really want
only 64bit argument? unsigned long != u64, so your code is not
architecture independent. You claim you wrote it on purpose as
non-architecture-independent, but then I claim it's a bug. We are
supposed to write code which is portable, as much as possible, assuming
it does not affect readability.


I write code in v5.18, there are __ffs64() and fls64(). Asymmetric.
There are some difference in return val between __ffs() and ffs64().
__ffs(0) and ffs64(0) will give different value.

And I'm sure code run in 64bit. So I choose to use __ffs and __fls.

Maybe it could be compatbile with 32bit. But I don't have a environment to test this.

struct ni_hw_perf_event will be big than limit.
BUILD_BUG_ON(sizeof(struct ni_hw_perf_event) > offsetof(struct
hw_perf_event, target));

And why do you need to use any of such code? Please open one of hundreds
of other drivers which work correctly on 32 and 64-bit platforms.


Code for 64bit.
This code is to avoid struct ni_hw_perf_event is too big than struct hw_perf_event::target.
I learn it from arm-cmn.c.
ni_hw_perf_event will replace hw_perf_event.
I will put some useful information in it with less space and good field names.
But I can't exceed a limit.


2. Nothing justified dependency on 64bit. Drop or explain. Your previous
message did not provide real rationale.

If ARM64, then drop.

...

...

+ /* Step2: Traverse all clock domains. */
+ for (cd_idx = 0; cd_idx < ni->cd_num; cd_idx++) {
+ cd = cd_arrays[cd_idx];
+
+ num = ni_child_number(cd);
+ dev_dbg(dev, "The %dth clock domain has %d child nodes:", cd_idx, num);
+
+ /* Omit pmu node */
+ ni_pmu = devm_kzalloc(dev, struct_size(ni_pmu, ev_src_nodes, num - 1),
+ GFP_KERNEL);
+ ni_pmu->ev_src_num = num - 1;
+
+ if (!ni_pmu)
+ return -ENOMEM;
+
+ num_idx = 0;
+ for (nd_idx = 0; nd_idx < num; nd_idx++) {
+ nd = ni_child_pointer(pbase, cd, nd_idx);
+
+ node.base = nd;
+ node.node_type = ni_node_node_type(nd);
+
+ if (unlikely(ni_node_type(nd) == NI_PMU))
+ ni_pmu->pmu_node = node;
+ else
+ ni_pmu->ev_src_nodes[num_idx++] = node;
+ dev_dbg(dev, " name: %s id: %d", ni_node_name[node.type], node.id);
+ }
+
+ ni_pmu->dev = dev;
+ ni_pmu->ni = ni;
+ ni->ni_pmus[cd_idx] = ni_pmu;
+ }
+
+ devm_kfree(dev, cd_arrays);

Why? If it is not device-lifetime then allocate with usual way.


No device-lifetime.
Will allocate in stack.

I was thinking about kzalloc. But if array is small, stack could be as well.


If I have to return before devm_kfree because of wrong, I will have to use:

goto out;

out:
kfree();

But if I use devm_kzalloc, I will not be worried about that. Even if no device-lifetime.
Isn't this a good way?

...


+
+static const struct of_device_id ni_pmu_of_match[] = {
+ { .compatible = "hx,c2000-arm-ni" },

Don't send undocumented compatibles.

OK. Means I should send doc and code in one patch thread with more than
one patch?

Yes. Please open lore.kernel.org and look at any other submissions
involving bindings or other type of ABI documentation (like sysfs).

Get.


Best regards,
Krzysztof