Re: [PATCH v2] perf/hx_arm_ni: Support uncore ARM NI-700 PMU

From: Yang Jialong 杨佳龙
Date: Wed Jan 31 2024 - 22:02:01 EST




在 2024/1/31 18:36, Krzysztof Kozlowski 写道:
On 31/01/2024 11:07, Yang Jialong 杨佳龙 wrote:


在 2024/1/31 17:38, Krzysztof Kozlowski 写道:
On 31/01/2024 10:07, Yang Jialong 杨佳龙 wrote:


在 2024/1/31 15:59, Krzysztof Kozlowski 写道:
On 31/01/2024 08:08, JiaLong.Yang wrote:
This code is based on uncore PMUs arm_smmuv3_pmu and arm-cmn.
One ni-700 can have many clock domains. Each of them has only one PMU.
Here one PMU corresponds to one 'struct ni_pmu' instance.
PMU name will be ni_pmu_N_M, which N means different NI-700s and M means
different PMU in one NI-700. If only one NI-700 found in NI-700, name will
be ni_pmu_N.
Node interface event name will be xxni_N_eventname, such as
asni_0_rdreq_any. There are many kinds of type of nodes in one clock
domain. Also means that there are many kinds of that in one PMU. So we
distinguish them by xxni string. Besides, maybe there are many nodes
have same type. So we have number N in event name.
By ni_pmu_0_0/asni_0_rdreq_any/, we can pinpoint accurate bus traffic.
Example1: perf stat -a -e ni_pmu_0_0/asni_0_rdreq_any/,ni_pmu_0_0/cycles/
EXample2: perf stat -a -e ni_pmu_0_0/asni,id=0,event=0x0/

Signed-off-by: JiaLong.Yang <jialong.yang@xxxxxxxxxxxx>
---
v1 --> v2:
1. Submit MAINTANER Documentation/ files seperately.

SEPARATE PATCHES, not patchsets. You have now checkpatch warnings
because of this...

...OK. But the MAINTANER file changing should be given in which one
patches.
I will submit patch v3 after talking and your permission.


2. Delete some useless info printing.
3. Change print from pr_xxx to dev_xxx.
4. Fix more than 75 length log info.
5. Fix dts attribute pccs-id.
6. Fix generic name according to DT specification.
7. Some indentation.
8. Del of_match_ptr macro.

drivers/perf/Kconfig | 11 +
drivers/perf/Makefile | 1 +
drivers/perf/hx_arm_ni.c | 1284 ++++++++++++++++++++++++++++++++++++++
3 files changed, 1296 insertions(+)
create mode 100644 drivers/perf/hx_arm_ni.c

diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
index ec6e0d9194a1..95ef8b13730f 100644
--- a/drivers/perf/Kconfig
+++ b/drivers/perf/Kconfig
@@ -241,4 +241,15 @@ config CXL_PMU
If unsure say 'm'.
+config HX_ARM_NI_PMU
+ tristate "HX ARM NI-700 PMU"
+ depends on PPC_HX_C2000 && 64BIT

1. There is no PPC_HX_C2000.

I have been used to using this macro. However this macro is not existed
in mainline.
I will replace it with ARM64. And del involved C code if OK.

64bit:
__ffs(unsigned long) and __fls(unsigned long) will be wrong in 32bit. I
pass a u64 argument.

One thing is where the code is supposed to run, second thing is compile
testing.


Now run on my company product, a 64bit PowerPC...
But I think it's general for 64bit systems.

Why do you use __ffs, not __ffs64 which takes u64 if you really want
only 64bit argument? unsigned long != u64, so your code is not
architecture independent. You claim you wrote it on purpose as
non-architecture-independent, but then I claim it's a bug. We are
supposed to write code which is portable, as much as possible, assuming
it does not affect readability.


I write code in v5.18, there are __ffs64() and fls64(). Asymmetric.

Sorry, that's a no go.

That's some very, very old kernel. Do not develop on old kernels, but on
mainline. I also suspect that by basing your work on old kernel, you
duplicate a lot of issues already fixed.

There are some difference in return val between __ffs() and ffs64().
__ffs(0) and ffs64(0) will give different value.

__ffs64 calls __ffs, so why would results be different?

Anyway, that's not really excuse.


OK. Follow mainline.



And I'm sure code run in 64bit. So I choose to use __ffs and __fls.

Maybe it could be compatbile with 32bit. But I don't have a environment
to test this.

struct ni_hw_perf_event will be big than limit.
BUILD_BUG_ON(sizeof(struct ni_hw_perf_event) > offsetof(struct
hw_perf_event, target));

And why do you need to use any of such code? Please open one of hundreds
of other drivers which work correctly on 32 and 64-bit platforms.


Code for 64bit.
This code is to avoid struct ni_hw_perf_event is too big than struct
hw_perf_event::target.

1. Why would that matter? target is task_struct. It's size does not
matter. Maybe its offset matters, but not size.


Offset.

2. So you claim that on 32-bit system the structure will be bigger than
on 64-bit system?

The structure will exceed the offset in 32bit. Maybe because the latter changed more.
OK. Dont care please.


I learn it from arm-cmn.c.

Are you copying patterns because they are good patterns or just because
you decided to copy?

Maybe this way is not very good for event framework.
OK. Not an official way.


ni_hw_perf_event will replace hw_perf_event.
I will put some useful information in it with less space and good field
names.
But I can't exceed a limit.


2. Nothing justified dependency on 64bit. Drop or explain. Your previous
message did not provide real rationale.

If ARM64, then drop.

...

...

+ /* Step2: Traverse all clock domains. */
+ for (cd_idx = 0; cd_idx < ni->cd_num; cd_idx++) {
+ cd = cd_arrays[cd_idx];
+
+ num = ni_child_number(cd);
+ dev_dbg(dev, "The %dth clock domain has %d child nodes:", cd_idx, num);
+
+ /* Omit pmu node */
+ ni_pmu = devm_kzalloc(dev, struct_size(ni_pmu, ev_src_nodes, num - 1),
+ GFP_KERNEL);
+ ni_pmu->ev_src_num = num - 1;
+
+ if (!ni_pmu)
+ return -ENOMEM;
+
+ num_idx = 0;
+ for (nd_idx = 0; nd_idx < num; nd_idx++) {
+ nd = ni_child_pointer(pbase, cd, nd_idx);
+
+ node.base = nd;
+ node.node_type = ni_node_node_type(nd);
+
+ if (unlikely(ni_node_type(nd) == NI_PMU))
+ ni_pmu->pmu_node = node;
+ else
+ ni_pmu->ev_src_nodes[num_idx++] = node;
+ dev_dbg(dev, " name: %s id: %d", ni_node_name[node.type], node.id);
+ }
+
+ ni_pmu->dev = dev;
+ ni_pmu->ni = ni;
+ ni->ni_pmus[cd_idx] = ni_pmu;
+ }
+
+ devm_kfree(dev, cd_arrays);

Why? If it is not device-lifetime then allocate with usual way.


No device-lifetime.
Will allocate in stack.

I was thinking about kzalloc. But if array is small, stack could be as well.


If I have to return before devm_kfree because of wrong, I will have to use:

goto out;

out:
kfree();

But if I use devm_kzalloc, I will not be worried about that. Even if no

devm* is not for that purpose. devm is for device-managed allocations.
Device does not manage your allocation.

device-lifetime.
Isn't this a good way?

Then you want cleanup.h and use proper __free().

Good NEW API. It does what I want.
Learned more. Thanks.


Best regards,
Krzysztof