Re: [PATCH v3] arch/x86: port I/O tracing on x86

From: Dan Raymond
Date: Wed Oct 04 2023 - 18:54:28 EST


On 10/3/2023 6:50 AM, Greg KH wrote:
>> +/*
>> + * We don't want the tracing logic included in the early boot modules (under
>> + * arch/x86/boot) so we check for their include guards here. If we don't do
>> + * this we will get compiler errors. These checks are not present in
>> + * arch/x86/include/asm/msr.h which contains similar tracing logic. That is
>> + * possible only because none of the msr inline functions are instantiated in
>> + * the early boot modules. If that changes this issue will need to be addressed
>> + * there as well. Therefore it might be better to handle this centrally in
>> + * tracepoint-defs.h.
>> + */
>> +
>> +#if defined(CONFIG_TRACEPOINTS) && !defined(BOOT_COMPRESSED_MISC_H) && !defined(BOOT_BOOT_H)
>
> I see what you are doing here in trying to see if a .h file has been
> included already, but now you are making an assumption on both the .h
> file ordering, and the #ifdef guard for those .h files, which are
> something that we almost never remember or even consider when dealing
> with .h files files.

With one exception io.h is included from boot.h or misc.h which is where
the include guards are defined:

# find arch/x86/boot -type f -print0 | xargs -0 grep "#include.*[^a-z]io\.h"
arch/x86/boot/boot.h:#include "io.h"
arch/x86/boot/compressed/misc.h:#include "../io.h"
arch/x86/boot/compressed/tdx.c:#include "../io.h"
arch/x86/boot/io.h:#include <asm/shared/io.h>

I agree this is fragile but the problem is not confined to this patch.
If I add a call to rdmsr() or wrmsr() in arch/x86/boot/compressed/misc.c
I get the same compiler error. It has something to do with the inline
assembly inside arch/x86/include/asm/jump_label.h.

I've copied Steven Rostedt who is the maintainer of tracefs to see if he
has any comment. I just noticed arch/x86/boot/msr.h and I see that it
redefines rdmsr() and wrmsr() and omits the tracepoints. A comment there
explains:

/*
* The kernel proper already defines rdmsr()/wrmsr(), but they are not for the
* boot kernel since they rely on tracepoint/exception handling infrastructure
* that's not available here.
*/

We could do something similar for inb()/outb() and redefine them in
arch/x86/boot/io.h instead of including <asm/shared/io.h> there.

> Who wants/needs port tracing these days? What types of systems still
> rely on that for their primary form of I/O other than some old-school
> serial ports?
>
> The MMIO tracing was added because there are crazy out-of-tree SoC
> devices out there that "insisted" that they need to hook into the mmio
> access path, so they added traceing there under the auspicious of trying
> to log all mmio accesses so that they could then override the access
> path in the tracehook to do who-knows-what other things. Hopefully you
> are not wanting to do the same thing here as well?
>
> And have you addressed all of Peter's previous review comments?

The author of the CONFIG_TRACE_MMIO_ACCESS feature (Sai Prakash Ranjan)
stated that he uses it for debugging unclocked register accesses on ARM.
Andy Shevchenko mentioned that he has used MMIO tracing for tracing
memory-mapped UARTs on x86. I was motivated to create this feature to
debug UART problems in an environment that runs Linux on a Celeron-based
SBC which uses port-mapped UARTs.

This feature can also be used for other things besides UARTs. Things
like the PIT and interrupt controller use port-mapped I/O. I work on
an embedded system that uses an ISA card cage to connect peripherals
to our SBC and all communication over the ISA bus uses port I/O.

Yes, I believe I have addressed all of Peter's review comments:

1) cannot use early_console->write()

That is not true. Everything that includes boot.h (ie.
'arch/x86/boot/early_serial_console.c') will not change due to the
'#ifdef BOOT_BOOT_H' guard. Also tracing will not be compiled in
unless 'CONFIG_TRACEPOINTS' is true. Finally, tracing will be
disabled at boot until the user mounts the tracefs and explicitly
enables 'portio' events. When tracing is disabled a nop
instruction is used in lieu of a jmp instruction to avoid
executing any trace code.

2) tracing relies on RCU which might not be alive yet and might
itself be the problem

RCU is not needed unless/until tracing is enabled. I also
demonstrated that port I/O tracing works correctly during boot by
testing this patch with "trace_event=portio" on the command line.

>> obj-$(CONFIG_SMP) += msr-smp.o cache-smp.o
>> +obj-$(CONFIG_TRACEPOINTS) += trace_portio.o
>
> So this is always enabled? No separate config option? Why not?

The CONFIG_TRACE_MMIO_ACCESS feature has a separate config option
but it incurs a significant amount of overhead. For example
mem32_serial_out() comprises 4 instructions (12 bytes) when
CONFIG_TRACE_MMIO_ACCESS is false. It comprises 33 instructions
(101 bytes) when CONFIG_TRACE_MMIO_ACCESS is true and all of those
instructions are executed (including two function calls) whether
tracing is enabled or not.

This feature incurs much less overhead. For example io_serial_out()
comprises 4 instructions (6 bytes) when CONFIG_TRACEPOINTS is false.
It comprises 14 instructions (34 bytes) when CONFIG_TRACEPOINTS is
true but only two of the extra instructions are executed when
tracing is disabled (and neither of them are branches). The
extra instructions for tracing are marked with *** below:

c104e740 <io_serial_out>:
c104e740: 01 c2 add edx,eax
c104e742: 88 c8 mov al,cl
c104e744: 0f b7 d2 *** movzx edx,dx ***
c104e747: ee out dx,al
c104e748: 3e 8d 74 26 00 *** lea esi,ds:[esi+eiz*1+0x0] ***
c104e74d: c3 ret

c104e74e: 66 90 *** xchg ax,ax ***
c104e750: 55 *** push ebp ***
c104e751: 0f b6 c1 *** movzx eax,cl ***
c104e754: 89 e5 *** mov ebp,esp ***
c104e756: b9 62 00 00 00 *** mov ecx,0x62 ***
c104e75b: e8 80 1e 33 00 *** call c13805e0 <do_trace_portio_write> ***
c104e760: 5d *** pop ebp ***
c104e761: c3 *** ret ***

This is similar to the tracing overhead of rdmsr()/wrmsr() which
does not have a separate config option.

>> +/* SPDX-License-Identifier: GPL-2.0+ */
>
> Why "+"? (I have to ask).
>
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>
> This is -only, which is fine, but your patch doesn't seem to be uniform
> here for new files being added in the same patch, right? So documenting
> this somewhere (i.e. in the changelog), is essential.

I did not intend to be inconsistent. I can make them both GPL-2.0-only if
that is preferred.

>> +DECLARE_EVENT_CLASS(portio_class,
>> + TP_PROTO(u32 value, u16 port, char width, long ip_addr),
>
> Memory locations are stored in "unsigned long" not "long", right?

It doesn't matter since "unsigned long" is the same size as "long".
I should use "void *" here but that will require a cast at the call
sites because for some reason _RET_IP_ is casting the pointer to
"unsigned long".

>> + TP_printk("port=0x%04x value=0x%0*x %pS",
>> + __entry->port,
>> + __entry->width == 'b' ? 2 :
>> + __entry->width == 'w' ? 4 : 8,
>> + __entry->value, (void *)__entry->ip_addr)
>
> Logging kernel memory locations, why? Where is this format documented?

I borrowed this technique from the CONFIG_TRACE_MMIO_ACCESS feature.
It indicates from where the port access originated which can be
useful. For example, it will reveal problems caused by two sections
of code accessing the same resource simultaneously. It can also be
used for trace filtering or triggering. The "%pS" format is
documented here:

https://www.kernel.org/doc/Documentation/printk-formats.txt

Here is an example of the output:

<idle>-0 [000] d.s2. 244.534473: portio_read: port=0xeb12 value=0x0080 uhci_check_ports+0x7c/0x2a0
<idle>-0 [000] d.s2. 244.534474: portio_read: port=0xeb10 value=0x0080 uhci_hub_status_data+0x2dc/0x4b0
<idle>-0 [000] d.s2. 244.534475: portio_read: port=0xeb12 value=0x0080 uhci_hub_status_data+0x2dc/0x4b0
my_app-90 [000] d..1. 244.545706: portio_write: port=0x02f9 value=0x07 io_serial_out+0x2d/0x30
my_app-90 [000] d.h2. 244.545720: portio_read: port=0x02fa value=0xc2 io_serial_in+0x36/0x40
my_app-90 [000] d.h3. 244.545722: portio_read: port=0x02fd value=0x60 io_serial_in+0x36/0x40
my_app-90 [000] d.h3. 244.545725: portio_read: port=0x02fe value=0x00 io_serial_in+0x36/0x40
my_app-90 [000] d.h3. 244.545727: portio_write: port=0x02f8 value=0xc9 io_serial_out+0x2d/0x30
my_app-90 [000] d.h3. 244.545732: portio_write: port=0x02f9 value=0x05 io_serial_out+0x2d/0x30
my_app-90 [000] d.h2. 244.545734: portio_read: port=0x02fa value=0xc1 io_serial_in+0x36/0x40
<idle>-0 [000] d.s2. 244.590459: portio_read: port=0x0408 value=0x008e2dfc acpi_pm_read_verified+0x9a/0xb0
<idle>-0 [000] d.s2. 244.590463: portio_read: port=0x0408 value=0x008e2e19 acpi_pm_read_verified+0x82/0xb0
<idle>-0 [000] d.s2. 244.590464: portio_read: port=0x0408 value=0x008e2e1e acpi_pm_read_verified+0x6d/0xb0