Re: [PATCHv4 2/2] arm64/io: Add a header for mmio access instrumentation

From: Sai Prakash Ranjan
Date: Mon Nov 22 2021 - 09:19:48 EST


On 11/22/2021 7:29 PM, Arnd Bergmann wrote:
On Mon, Nov 22, 2021 at 2:35 PM Sai Prakash Ranjan
<quic_saipraka@xxxxxxxxxxx> wrote:
On 11/19/2021 9:36 AM, Sai Prakash Ranjan wrote:

So I looked at logic_iomem.c which seems to be useful for emulated IO
for virtio drivers
but our usecase just needs to log the mmio operations and no additional
stuff, similar to
the logging access of x86 msr registers via tracepoint
(arch/x86/include/asm/msr-trace.h).
I think it depends on whether one wants to filter the MMIO access based
on the device, or based on the caller.

Also raw read/write macros in logic_iomem.c have the callbacks which
seems to be pretty costly
than inlining or direct function call given it has to be called for
every register read and write
which are going to be thousands in our case. In their usecase, read and
write callbacks are just
pci cfgspace reads and writes which may not be that frequently called
and the latency might not
be visible but in our case, I think it would be visible if we have a
callback as such. I know this is a
debug feature and perf isn't expected much but that wouldn't mean we
should not have a debug
feature which performs better right.
I would expect the cost of a bus access to always dwarf the cost of
indirect function calls and instrumentation. On the other hand,
the cost of an inline trace call is nontrivial in terms of code size,
which may lead to wasting significant amounts of both RAM and
instruction cache on small machines. If you want to continue with
your approach, it would help to include code size numbers before/after
for a defconfig kernel, and maybe some performance numbers to
show what this does when you enable tracing for all registers of
a device with a lot of accesses.

Sure, I will get the numbers for both cases(inline and indirect calls) and run some
benchmark tests with register tracing enabled for both cases.


On the second point, filtering by ioremap isn't much useful for our
usecase since ioremapped
region can have 100s of registers and we are interested in the exact
register read/write which
would cause any of the issues mentioned in the description of this patchset.

So I feel like the current way where we consolidate the instrumentation
in mmio-instrumented.h
seems like the better way than adding tracing to an emulated iomem
library.
There is another point that I don't like in the implementation, which is
the extra indirection. If we end up with your approach of doing it
inline per caller, I would prefer having the instrumentation in
include/asm-generic/io.h, like

#ifndef readl
#define readl readl
static inline u32 readl(const volatile void __iomem *addr)
{
u32 val;

__io_br();
val = __le32_to_cpu((__le32 __force)__raw_readl(addr));
__io_ar(val);
if (tracepoint_enabled(rwmmio_read))
log_read_mmio("readl", addr, val);
return val;
}
#endif

I think this would be a lot less confusing to readers, as it is implemented
exactly in the place that has the normal definition, and it can also have
somewhat more logical semantics by only instrumenting the
normal/relaxed/ioport accessors but not the __raw_* versions that
are meant to be little more than a pointer dereference.

Arnd

But how is this different from logic in atomic-instrumented.h which also has asm-generic version?
Initial review few years back mentioned about having something similar to atomic instrumentation
and hence it was implemented with the similar approach keeping instrumentation out of arch specific
details.
And if we do move this instrumentation to asm-generic/io.h, how will that be executed since
the arch specifc read{b,w,l,q} overrides this generic version?

Thanks,
Sai