Re: [PATCH v3 3/3] genirq: Use the maple tree for IRQ descriptors management

From: Yujie Liu
Date: Sun May 07 2023 - 04:08:45 EST


Hi Thomas,

Sorry for late reply as we were on public holiday earlier this week.

On Fri, Apr 28, 2023 at 12:31:14PM +0200, Thomas Gleixner wrote:
> On Fri, Apr 28 2023 at 09:33, Yujie Liu wrote:
> > On Wed, Apr 26, 2023 at 02:08:54PM +0200, Thomas Gleixner wrote:
> >> Can you please retest this on v6.3 and report back when the problem
> >> persists?
> >
> > Thanks for your help looking into this problem.
> >
> > The problem persists when tested on v6.3, but not 100% reproducible.
> > We ran the test on v6.3 and v6.3+patch each for 20 runs. There are 9
> > failed runs on v6.3+patch, while v6.3 is all clean. Full dmesg is
> > attached.
>
> Under the assumption that the code is correct, then the effect of this
> patch is that it changes the timing. Sigh.
>
> 1) Does this happen with a 64-bit kernel too?

It doesn't happen on a 64-bit kernel:

=========================================================================================
commit/compiler/disk/kconfig/rootfs/tbox_group/test/testcase:
32c58fc685e5cd6b5947a5f8e9a3c318a63fcf0a/gcc-11/1SSD/i386-debian-10.3-func/debian-11.1-i386-20220923.cgz/lkp-skl-d06/block-group-00/blktests

commit/compiler/disk/kconfig/rootfs/tbox_group/test/testcase:
32c58fc685e5cd6b5947a5f8e9a3c318a63fcf0a/gcc-11/1SSD/x86_64-rhel-8.3-func/debian-11.1-x86_64-20220510.cgz/lkp-skl-d06/block-group-00/blktests

kconfig:
i386-debian-10.3-func
x86_64-rhel-8.3-func

i386-debian-10.3 x86_64-rhel-8.3-func
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
16:48 -33% :7 dmesg.EIP:default_send_IPI_mask_logical
16:48 -33% :7 dmesg.WARNING:at_arch/x86/kernel/apic/ipi.c:#default_send_IPI_mask_logical

> 2) Can you enable the irq_vector:vector_*.* tracepoints and provide
> the trace?

I'm a beginner of kernel and not sure if I'm doing this correctly. Here
are my test steps:

# enable irq_vector events
# cat /sys/kernel/debug/tracing/set_event
irq_vectors:vector_free_moved
irq_vectors:vector_setup
irq_vectors:vector_teardown
irq_vectors:vector_deactivate
irq_vectors:vector_activate
irq_vectors:vector_alloc_managed
irq_vectors:vector_alloc
irq_vectors:vector_reserve
irq_vectors:vector_reserve_managed
irq_vectors:vector_clear
irq_vectors:vector_update
irq_vectors:vector_config

# turn on the tracer
# cat /sys/kernel/debug/tracing/tracing_on
1

# run the testcase
# ./check block/008
block/008 => sdb2 (do IO while hotplugging CPUs)
read iops 0 ...
runtime 0.576s ...

block/008 => sdb2 (do IO while hotplugging CPUs) [failed]
read iops 0 ... 0d06 at May 7 07:33:46 ...
runtime 0.576s ... 0.580sommon_interrupt: 2.34 No irq handler for vector
--- tests/block/008.out 2023-04-18 17:36:53.000000000 +0000
+++ /lkp/benchmarks/blktests/results/sdb2/block/008.out.bad 2023-05-07 07:33:47.336596084 +0000
@@ -1,2 +1,8 @@
Running block/008
+fio: io_u error on file /dev/sdb2: Input/output error: read offset=273571840, buflen=4096
+fio: io_u error on file /dev/sdb2: Input/output error: read offset=64757760, buflen=4096
+fio: io_u error on file /dev/sdb2: Input/output error: read offset=700571648, buflen=4096
+fio: io_u error on file /dev/sdb2: Input/output error: read offset=105357312, buflen=4096
+fio exited with status 0
+4;fio-3.25;reads;0;5;0;0;0;0;0;0;0.000000;0.000000;0;0;0.000000;0.000000;1.000000%=0;5.000000%=0;10.000000%=0;20.000000%=0;30.000000%=0;40.000000%=0;50.000000%=0;60.000000%=0;70.000000%=0;80.000000%=0;90.000000%=0;95.000000%=0;99.000000%=0;99.500000%=0;99.900000%=0;99.950000%=0;99.990000%=0;0%=0;0%=0;0%=0;0;0;0.000000;0.000000;0;0;0.000000%;0.000000;0.000000;0;0;0;0;0;0;0.000000;0.000000;0;0;0.000000;0.000000;1.000000%=0;5.000000%=0;10.000000%=0;20.000000%=0;30.000000%=0;40.000000%=0;50.000000%=0;60.000000%=0;70.000000%=0;80.000000%=0;90.000000%=0;95.000000%=0;99.000000%=0;99.500000%=0;99.900000%=0;99.950000%=0;99.990000%=0;0%=0;0%=0;0%=0;0;0;0.000000;0.000000;0;0;0.000000%;0.000000;0.000000;0;0;0;0;0;0;0.000000;0.000000;0;0;0.000000;0.000000;1.000000%=0;5.000000%=0;10.000000%=0;20.000000%=0;30.000000%=0;40.000000%=0;50.000000%=0;60.000000%=0;70.000000%=0;80.000000%=0;90.000000%=0;95.000000%=0;99.000000%=0;99.500000%=0;99.900000%=0;99.950000%=0;99.990000%=0;0%=0;0%=0;0%=0;0;0;0.000000;0.000000;0;0;0.000000%;0.000000;0.000000;0.000000%;0.000000%;12;0;163;100.0%;0.0%;0.0%;0.0%;0.0%;0.0%;0.0%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;sdb;0;0;0;0;0;0;0;0.00%
...
(Run 'diff -u tests/block/008.out /lkp/benchmarks/blktests/results/sdb2/block/008.out.bad' to see the entire diff)

# check the trace
# cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# entries-in-buffer/entries-written: 0/0 #P:4
#
# _-----=> irqs-off/BH-disabled
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / _-=> migrate-disable
# |||| / delay
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |

Nothing was written to trace buffer, seems like no irq_vector events
were captured during this test.

> And please provide /proc/interrupts from that machine.

Before running the test:

# cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
0: 34 0 0 0 IO-APIC 2-edge timer
1: 4 0 0 0 IO-APIC 1-edge i8042
4: 0 0 635 0 IO-APIC 4-edge ttyS0
8: 0 0 0 0 IO-APIC 8-edge rtc0
9: 0 0 0 0 IO-APIC 9-fasteoi acpi
12: 0 0 0 6 IO-APIC 12-edge i8042
16: 0 0 0 4 IO-APIC 16-fasteoi i801_smbus
19: 12 0 0 0 IO-APIC 19-fasteoi
120: 3608 0 617 0 PCI-MSI-0000:00:1f.6 0-edge eth0
121: 50 0 0 0 PCI-MSI-0000:00:16.0 0-edge mei_me
122: 0 0 0 0 PCI-MSI-0000:00:14.0 0-edge xhci_hcd
123: 0 0 2109981 0 PCI-MSI-0000:00:17.0 0-edge ahci[0000:00:17.0]
124: 0 0 0 206 PCI-MSI-0000:00:02.0 0-edge i915
NMI: 0 0 0 0 Non-maskable interrupts
LOC: 221252 228764 471985 229286 Local timer interrupts
SPU: 0 0 0 0 Spurious interrupts
PMI: 0 0 0 0 Performance monitoring interrupts
IWI: 16 0 0 8 IRQ work interrupts
RTR: 108 25 15 11 APIC ICR read retries
RES: 5016 5486 697 10440 Rescheduling interrupts
CAL: 4618 4578 2775 5754 Function call interrupts
TLB: 12 15 14 18 TLB shootdowns
TRM: 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 Threshold APIC interrupts
MCE: 0 0 0 0 Machine check exceptions
MCP: 4 108 109 107 Machine check polls
ERR: 0
MIS: 0
PIN: 0 0 0 0 Posted-interrupt notification event
NPI: 0 0 0 0 Nested posted-interrupt event
PIW: 0 0 0 0 Posted-interrupt wakeup event

After running the test:

# cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
0: 34 0 0 0 IO-APIC 2-edge timer
1: 4 0 0 0 IO-APIC 1-edge i8042
4: 0 0 635 0 IO-APIC 4-edge ttyS0
8: 0 0 0 0 IO-APIC 8-edge rtc0
9: 0 0 0 0 IO-APIC 9-fasteoi acpi
12: 0 0 0 6 IO-APIC 12-edge i8042
16: 0 0 0 4 IO-APIC 16-fasteoi i801_smbus
19: 12 0 0 0 IO-APIC 19-fasteoi
120: 3807 0 617 0 PCI-MSI-0000:00:1f.6 0-edge eth0
121: 50 0 0 0 PCI-MSI-0000:00:16.0 0-edge mei_me
122: 0 0 0 0 PCI-MSI-0000:00:14.0 0-edge xhci_hcd
123: 0 0 2109981 0 PCI-MSI-0000:00:17.0 0-edge ahci[0000:00:17.0]
124: 0 0 0 206 PCI-MSI-0000:00:02.0 0-edge i915
NMI: 0 0 0 0 Non-maskable interrupts
LOC: 221568 228973 472239 229475 Local timer interrupts
SPU: 0 0 0 0 Spurious interrupts
PMI: 0 0 0 0 Performance monitoring interrupts
IWI: 16 0 0 8 IRQ work interrupts
RTR: 108 25 16 11 APIC ICR read retries
RES: 5017 5488 698 10440 Rescheduling interrupts
CAL: 4622 4584 2783 5757 Function call interrupts
TLB: 12 15 15 19 TLB shootdowns
TRM: 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 Threshold APIC interrupts
MCE: 0 0 0 0 Machine check exceptions
MCP: 4 108 109 109 Machine check polls
ERR: 0
MIS: 0
PIN: 0 0 0 0 Posted-interrupt notification event
NPI: 0 0 0 0 Nested posted-interrupt event
PIW: 0 0 0 0 Posted-interrupt wakeup event


Thanks again for taking the time to look into this.

--
Best Regards,
Yujie