Re: Oops on /proc/interrupt access with 6.5-rc1

From: Shanker Donthineni
Date: Tue Jul 11 2023 - 12:56:28 EST


Hi,

On 7/11/23 10:51, Johan Hovold wrote:
External email: Use caution opening links or attachments


Hi,

Konrad reported on IRC that he hit a segfault and hang when watch:ing
/proc/interrupts with 6.5-rc1.

I tried simply catting it and hit the below oops immediately with my
X13s (aarch64).


I have successfully verified the execution of the "cat /proc/interrupts" command
on the NVIDIA-GRACE server platform, using v6.5.0-rc1, without any errors. I
conducted tests using 8, 16 and 72 CPUs by setting the max number of CPUs
(maxcpus=). Not able to reproduce the Oops, tried ~10 times.

root@Grace# uname -a
Linux Grace 6.5.0-rc1 #2 SMP Tue Jul 11 11:13:59 CDT 2023 aarch64 GNU/Linux

root@Grace# cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
9: 0 0 0 0 0 0 0 0 GICv3 25 Level vgic
10: 0 0 0 0 0 0 0 0 GICv3 30 Level kvm guest ptimer
11: 0 0 0 0 0 0 0 0 GICv3 27 Level kvm guest vtimer
12: 3315 1855 1750 3268 10540 2394 8336 1607 GICv3 26 Level arch_timer
18: 0 0 0 0 0 0 0 0 GICv3 276 Edge arm-smmu-v3-evtq
19: 0 0 0 0 0 0 0 0 GICv3 277 Edge arm-smmu-v3-gerror
20: 0 0 0 0 0 0 0 0 GICv3 285 Edge arm-smmu-v3-evtq
21: 0 0 0 0 0 0 0 0 GICv3 286 Edge arm-smmu-v3-gerror
22: 0 0 0 0 0 0 0 0 GICv3 294 Edge arm-smmu-v3-evtq
23: 0 0 0 0 0 0 0 0 GICv3 295 Edge arm-smmu-v3-gerror
24: 0 0 0 0 0 0 0 0 GICv3 303 Edge arm-smmu-v3-evtq
25: 0 0 0 0 0 0 0 0 GICv3 304 Edge arm-smmu-v3-gerror
26: 3 0 0 0 0 0 0 0 GICv3 312 Edge arm-smmu-v3-evtq
27: 0 0 0 0 0 0 0 0 GICv3 313 Edge arm-smmu-v3-gerror
33: 0 0 0 0 0 0 0 0 GICv3 226 Level ACPI:Ged
34: 0 0 0 0 0 0 0 0 GICv3 227 Level ACPI:Ged
65: 1724 0 0 0 0 0 0 0 GICv3 202 Level uart-pl011
68: 0 0 0 0 0 0 0 0 ITS-MSI 1077444608 Edge ehci_hcd:usb1
69: 0 0 0 0 0 0 0 0 GICv3 23 Level arm-pmu
84: 0 150 0 0 0 0 0 0 ITS-MSI 1075314688 Edge nvme0q0
85: 0 0 0 0 0 0 0 0 ITS-MSI 1075314689 Edge nvme0q1
86: 0 0 0 0 0 0 0 0 ITS-MSI 1075314690 Edge nvme0q2
87: 0 0 0 0 0 0 0 0 ITS-MSI 1075314691 Edge nvme0q3
88: 0 0 0 0 0 0 0 0 ITS-MSI 1075314692 Edge nvme0q4
89: 0 0 0 0 10 0 0 0 ITS-MSI 1075314693 Edge nvme0q5
90: 0 0 0 0 0 0 0 0 ITS-MSI 1075314694 Edge nvme0q6
91: 0 0 0 0 0 0 0 0 ITS-MSI 1075314695 Edge nvme0q7
92: 0 0 0 0 0 0 0 0 ITS-MSI 1075314696 Edge nvme0q8
93: 0 0 0 0 0 0 0 0 ITS-MSI 1075314697 Edge nvme0q9
94: 0 0 0 0 0 0 0 0 ITS-MSI 1075314698 Edge nvme0q10
95: 0 0 0 0 0 0 0 0 ITS-MSI 1075314699 Edge nvme0q11
96: 0 0 0 0 0 0 0 0 ITS-MSI 1075314700 Edge nvme0q12
97: 0 0 0 0 0 0 0 0 ITS-MSI 1075314701 Edge nvme0q13
98: 0 0 0 0 0 0 0 0 ITS-MSI 1075314702 Edge nvme0q14
99: 0 0 0 0 0 0 0 0 ITS-MSI 1075314703 Edge nvme0q15
100: 0 0 0 0 0 0 0 0 ITS-MSI 1075314704 Edge nvme0q16
101: 0 0 0 0 0 0 0 0 ITS-MSI 1075314705 Edge nvme0q17
102: 0 0 0 0 0 0 0 0 ITS-MSI 1075314706 Edge nvme0q18
103: 0 0 0 0 0 0 0 0 ITS-MSI 1075314707 Edge nvme0q19
104: 0 0 0 0 0 0 0 0 ITS-MSI 1075314708 Edge nvme0q20
105: 0 0 0 0 0 0 0 0 ITS-MSI 1075314709 Edge nvme0q21
106: 0 0 0 0 0 0 0 0 ITS-MSI 1075314710 Edge nvme0q22
107: 0 0 0 0 0 0 0 0 ITS-MSI 1075314711 Edge nvme0q23
108: 0 0 0 0 0 0 0 0 ITS-MSI 1075314712 Edge nvme0q24
109: 0 0 0 0 0 0 0 0 ITS-MSI 1075314713 Edge nvme0q25
110: 0 0 0 0 0 0 0 0 ITS-MSI 1075314714 Edge nvme0q26
111: 0 0 0 0 0 0 0 0 ITS-MSI 1075314715 Edge nvme0q27
112: 0 0 0 0 0 0 0 0 ITS-MSI 1075314716 Edge nvme0q28
113: 0 0 0 0 0 0 0 0 ITS-MSI 1075314717 Edge nvme0q29
114: 0 0 0 0 0 0 0 0 ITS-MSI 1075314718 Edge nvme0q30
115: 0 0 0 0 0 0 0 0 ITS-MSI 1075314719 Edge nvme0q31
116: 0 0 0 0 0 0 0 0 ITS-MSI 1075314720 Edge nvme0q32
117: 0 0 0 0 0 0 0 0 ITS-MSI 1075314721 Edge nvme0q33
118: 0 0 0 0 0 0 0 0 ITS-MSI 1075314722 Edge nvme0q34
119: 0 0 0 0 0 0 0 0 ITS-MSI 1075314723 Edge nvme0q35
120: 0 0 0 0 0 0 0 0 ITS-MSI 1075314724 Edge nvme0q36
121: 0 0 0 0 0 0 0 0 ITS-MSI 1075314725 Edge nvme0q37
122: 0 0 0 0 0 0 0 0 ITS-MSI 1075314726 Edge nvme0q38
123: 0 0 0 0 0 0 0 0 ITS-MSI 1075314727 Edge nvme0q39
124: 0 0 0 0 0 0 0 0 ITS-MSI 1075314728 Edge nvme0q40
125: 0 0 0 0 0 0 0 0 ITS-MSI 1075314729 Edge nvme0q41
126: 0 0 0 0 0 0 0 0 ITS-MSI 1075314730 Edge nvme0q42
127: 0 0 0 0 0 0 0 0 ITS-MSI 1075314731 Edge nvme0q43
128: 0 0 0 0 0 0 0 0 ITS-MSI 1075314732 Edge nvme0q44
129: 0 0 0 0 0 0 0 0 ITS-MSI 1075314733 Edge nvme0q45
130: 0 0 0 0 0 0 0 0 ITS-MSI 1075314734 Edge nvme0q46
131: 0 0 0 0 0 0 0 0 ITS-MSI 1075314735 Edge nvme0q47
132: 0 0 0 0 0 0 0 0 ITS-MSI 1075314736 Edge nvme0q48
133: 0 0 0 0 0 0 0 0 ITS-MSI 1075314737 Edge nvme0q49
134: 0 0 0 0 0 0 0 0 ITS-MSI 1075314738 Edge nvme0q50
135: 0 0 0 0 0 0 0 0 ITS-MSI 1075314739 Edge nvme0q51
136: 0 0 0 0 0 0 0 0 ITS-MSI 1075314740 Edge nvme0q52
137: 0 0 0 0 0 0 0 0 ITS-MSI 1075314741 Edge nvme0q53
138: 0 0 0 0 0 0 0 0 ITS-MSI 1075314742 Edge nvme0q54
139: 0 0 0 0 0 0 0 0 ITS-MSI 1075314743 Edge nvme0q55
140: 0 0 0 0 0 0 0 0 ITS-MSI 1075314744 Edge nvme0q56
141: 0 0 0 0 0 0 0 0 ITS-MSI 1075314745 Edge nvme0q57
142: 0 0 0 0 0 0 0 0 ITS-MSI 1075314746 Edge nvme0q58
143: 0 0 0 0 0 0 0 0 ITS-MSI 1075314747 Edge nvme0q59
144: 0 0 0 0 0 0 0 0 ITS-MSI 1075314748 Edge nvme0q60
145: 0 0 0 0 0 0 0 0 ITS-MSI 1075314749 Edge nvme0q61
146: 0 0 0 0 0 0 0 0 ITS-MSI 1075314750 Edge nvme0q62
147: 0 0 0 0 0 0 0 0 ITS-MSI 1075314751 Edge nvme0q63
148: 0 0 0 0 0 0 0 0 ITS-MSI 1075314752 Edge nvme0q64
149: 0 0 0 0 0 0 0 0 ITS-MSI 1075314753 Edge nvme0q65
150: 0 0 0 0 0 0 0 0 ITS-MSI 1075314754 Edge nvme0q66
151: 0 0 0 0 0 0 0 0 ITS-MSI 1075314755 Edge nvme0q67
152: 0 0 0 0 0 0 0 0 ITS-MSI 1075314756 Edge nvme0q68
153: 0 0 0 0 0 0 0 0 ITS-MSI 1075314757 Edge nvme0q69
154: 0 0 0 0 0 0 0 0 ITS-MSI 1075314758 Edge nvme0q70
155: 0 0 0 0 0 0 0 0 ITS-MSI 1075314759 Edge nvme0q71
156: 0 0 0 0 0 0 0 0 ITS-MSI 1075314760 Edge nvme0q72
IPI0: 15 3 7 18 16 23 19 22 Rescheduling interrupts
IPI1: 4429 473 294 1307 3926 1535 1897 216 Function call interrupts
IPI2: 0 0 0 0 0 0 0 0 CPU stop interrupts
IPI3: 0 0 0 0 0 0 0 0 CPU stop (for crash dump) interrupts
IPI4: 0 0 0 0 0 0 0 0 Timer broadcast interrupts
IPI5: 0 0 0 0 0 0 0 0 IRQ work interrupts
IPI6: 0 0 0 0 0 0 0 0 CPU wake-up interrupts
Err: 0

Commit 721255b9826b ("genirq: Use a maple tree for interrupt descriptor
management") stood out when skimming the log, and Marc soon suggested
the same possible culprit on IRC.

I have not been able to reproduce it with the maple tree patch reverted,
but I hit it again after adding it back. Did not trigger immediately
after boot though, I had had the machine idling for a few minutes in
between.

Marc asked for a dump so figured I'd CC the list as well.

Johan


[ 2546.693932] Unable to handle kernel paging request at virtual address ffff80008106bb19
[ 2546.695148] Mem abort info:
[ 2546.695562] ESR = 0x0000000096000007
[ 2546.695976] EC = 0x25: DABT (current EL), IL = 32 bits
[ 2546.696394] SET = 0, FnV = 0
[ 2546.696807] EA = 0, S1PTW = 0
[ 2546.697220] FSC = 0x07: level 3 translation fault
[ 2546.697642] Data abort info:
[ 2546.698066] ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
[ 2546.698494] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 2546.698922] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 2546.699355] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000002d7a05000
[ 2546.699792] [ffff80008106bb19] pgd=10000001000a5003, p4d=10000001000a5003, pud=10000001000a6003, pmd=1000000100d5a003, pte=0000000000000000
[ 2546.700387] Internal error: Oops: 0000000096000007 [#1] PREEMPT SMP
[ 2546.700796] Modules linked in: snd_soc_wsa883x q6prm_clocks q6apm_lpass_dais snd_q6dsp_common q6apm_dai q6prm michael_mic cbc des_generic libdes ecb algif_skcipher md5 algif_hash af_alg ip6_tables xt_LOG nf_log_syslog ipt_REJECT nf_reject_ipv4 xt_tcpudp snd_q6apm xt_conntrack nf_conntrack libcrc32c nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter r8152 mii qrtr_mhi panel_edp snd_soc_hdmi_codec venus_enc venus_dec apr videobuf2_dma_contig videobuf2_memops fastrpc qrtr_smd rpmsg_ctrl rpmsg_char qcom_pm8008_regulator qcom_battmgr pmic_glink_altmode ath11k_pci ath11k venus_core snd_soc_wcd938x v4l2_mem2mem hci_uart mac80211 msm videobuf2_v4l2 snd_soc_wcd938x_sdw snd_soc_sc8280xp libarc4 btqca snd_soc_qcom_common regmap_sdw snd_soc_lpass_rx_macro videodev snd_soc_lpass_va_macro soundwire_qcom leds_qcom_lpg snd_soc_lpass_wsa_macro bluetooth snd_soc_lpass_tx_macro qcom_spmi_adc_tm5 snd_soc_wcd_mbhc snd_soc_qcom_sdw snd_soc_lpass_macro_common cfg80211 gpu_sched gpio_sbu_mux videobuf2_common qcom_spmi_temp_alarm snd_soc_core
[ 2546.700875] qcom_spmi_adc5 ecdh_generic drm_display_helper ecc qcom_pon mc snd_compress qcom_q6v5_pas industrialio rtc_pm8xxx reboot_mode phy_qcom_qmp_combo mhi led_class_multicolor nvmem_qcom_spmi_sdam drm_dp_aux_bus rfkill qcom_vadc_common snd_pcm qcom_pil_info drm_kms_helper qcom_common phy_qcom_edp qcom_pm8008 qcom_stats qrtr qcom_glink_smem snd_timer typec videocc_sc8280xp icc_bwmon qcom_q6v5 phy_qcom_qmp_usb pinctrl_sc8280xp_lpass_lpi regmap_i2c snd qcom_sysmon phy_qcom_snps_femto_v2 pmic_glink soundwire_bus pinctrl_lpass_lpi pdr_interface lpasscc_sc8280xp icc_osm_l3 mdt_loader soundcore socinfo qcom_wdt qcom_rng qmi_helpers pwm_bl drm dm_mod ip_tables x_tables ipv6 pcie_qcom crc8 phy_qcom_qmp_pcie nvme nvme_core hid_multitouch i2c_qcom_geni i2c_hid_of i2c_hid i2c_core
[ 2546.705703] CPU: 4 PID: 610 Comm: cat Not tainted 6.5.0-rc1 #45
[ 2546.706287] Hardware name: LENOVO 21BYZ9SRUS/21BYZ9SRUS, BIOS N3HET53W (1.25 ) 10/12/2022
[ 2546.706880] pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 2546.707476] pc : string+0x4c/0xfc
[ 2546.708080] lr : vsnprintf+0x170/0x748
[ 2546.708674] sp : ffff800083563ac0
[ 2546.709265] x29: ffff800083563ac0 x28: ffff11b942bca791 x27: ffffbb03f92e0974
[ 2546.709866] x26: ffffbb03f92e0974 x25: 0000000000000020 x24: 0000000000000871
[ 2546.710476] x23: 00000000ffffffd8 x22: ffffbb03f9161778 x21: ffff800083563c10
[ 2546.711083] x20: ffff11b942bca78f x19: ffff11b942bcb000 x18: 0000000000000020
[ 2546.711688] x17: 0000000000000000 x16: 0000000000000000 x15: ffffffffffffffff
[ 2546.712297] x14: 0000000000000001 x13: 0000000000000003 x12: ffff11b942bca783
[ 2546.712910] x11: 0000000000000000 x10: 0000000000000020 x9 : 0000000000000000
[ 2546.713522] x8 : 00000000ffffffff x7 : ffff800083563c10 x6 : 0000000000000020
[ 2546.714133] x5 : ffff11b942bcb000 x4 : 0000000000000000 x3 : ffff0a00ffffff04
[ 2546.714752] x2 : ffff80008106bb19 x1 : ffffffffffffffff x0 : ffff11b942bca791
[ 2546.715362] Call trace:
[ 2546.715962] string+0x4c/0xfc
[ 2546.716557] vsnprintf+0x170/0x748
[ 2546.717152] seq_printf+0xb4/0xd0
[ 2546.717746] show_interrupts+0x2f4/0x4e8
[ 2546.718345] seq_read_iter+0x3bc/0x4ac
[ 2546.718940] proc_reg_read_iter+0x84/0xd8
[ 2546.719539] vfs_read+0x1d4/0x294
[ 2546.720137] ksys_read+0x68/0xf4
[ 2546.720735] __arm64_sys_read+0x1c/0x28
[ 2546.721335] invoke_syscall+0x48/0x114
[ 2546.721934] el0_svc_common.constprop.0+0x60/0x10c
[ 2546.722536] do_el0_svc+0x30/0x88
[ 2546.723132] el0_svc+0x40/0xac
[ 2546.723729] el0t_64_sync_handler+0xc0/0xc4
[ 2546.724329] el0t_64_sync+0x190/0x194
[ 2546.724930] Code: 91000400 110004e1 eb08009f 540000e0 (38646846)
[ 2546.725536] ---[ end trace 0000000000000000 ]---
[ 2546.726143] note: cat[610] exited with irqs disabled
[ 2546.726781] note: cat[610] exited with preempt_count 1