Re: New kernel warning after updating from LTS 5.15.110 to 5.15.112 (and 5.15.113)

From: Jarkko Sakkinen
Date: Fri Jun 09 2023 - 02:20:14 EST


On Thu Jun 8, 2023 at 11:39 PM EEST, Chris Packham wrote:
> Hi Jarkko,
>
> On 9/06/23 03:17, Jarkko Sakkinen wrote:
> > On Wed Jun 7, 2023 at 7:15 PM EEST, Jarkko Sakkinen wrote:
> >> On Wed Jun 7, 2023 at 12:04 AM EEST, Chris Packham wrote:
> >>> Hi Jarkko,
> >>>
> >>> On 6/06/23 21:39, Jarkko Sakkinen wrote:
> >>>> On Sun, 2023-05-28 at 23:42 +0000, Chris Packham wrote:
> >>>>> Hi,
> >>>>>
> >>>>> We have an embedded product with an Infineon SLM9670 TPM. After updating
> >>>>> to a newer LTS kernel version we started seeing the following warning at
> >>>>> boot.
> >>>>>
> >>>>> [    4.741025] ------------[ cut here ]------------
> >>>>> [    4.749894] irq 38 handler tis_int_handler+0x0/0x154 enabled interrupts
> >>>>> [    4.756555] WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:159
> >>>>> __handle_irq_event_percpu+0xf4/0x180
> >>>>> [    4.765557] Modules linked in:
> >>>>> [    4.768626] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.113 #1
> >>>>> [    4.774747] Hardware name: Allied Telesis x250-18XS (DT)
> >>>>> [    4.780080] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS
> >>>>> BTYPE=--)
> >>>>> [    4.787072] pc : __handle_irq_event_percpu+0xf4/0x180
> >>>>> [    4.792146] lr : __handle_irq_event_percpu+0xf4/0x180
> >>>>> [    4.797220] sp : ffff800008003e40
> >>>>> [    4.800547] x29: ffff800008003e40 x28: ffff8000093951c0 x27:
> >>>>> ffff80000902a9b8
> >>>>> [    4.807716] x26: ffff800008fe8d28 x25: ffff8000094a62bd x24:
> >>>>> ffff000001b92400
> >>>>> [    4.814885] x23: 0000000000000026 x22: ffff800008003ec4 x21:
> >>>>> 0000000000000000
> >>>>> [    4.822053] x20: 0000000000000001 x19: ffff000002381200 x18:
> >>>>> ffffffffffffffff
> >>>>> [    4.829222] x17: ffff800076962000 x16: ffff800008000000 x15:
> >>>>> ffff800088003b57
> >>>>> [    4.836390] x14: 0000000000000000 x13: ffff8000093a5078 x12:
> >>>>> 000000000000035d
> >>>>> [    4.843558] x11: 000000000000011f x10: ffff8000093a5078 x9 :
> >>>>> ffff8000093a5078
> >>>>> [    4.850727] x8 : 00000000ffffefff x7 : ffff8000093fd078 x6 :
> >>>>> ffff8000093fd078
> >>>>> [    4.857895] x5 : 000000000000bff4 x4 : 0000000000000000 x3 :
> >>>>> 0000000000000000
> >>>>> [    4.865062] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
> >>>>> ffff8000093951c0
> >>>>> [    4.872230] Call trace:
> >>>>> [    4.874686]  __handle_irq_event_percpu+0xf4/0x180
> >>>>> [    4.879411]  handle_irq_event+0x64/0xec
> >>>>> [    4.883264]  handle_level_irq+0xc0/0x1b0
> >>>>> [    4.887202]  generic_handle_irq+0x30/0x50
> >>>>> [    4.891229]  mvebu_gpio_irq_handler+0x11c/0x2a0
> >>>>> [    4.895780]  handle_domain_irq+0x60/0x90
> >>>>> [    4.899720]  gic_handle_irq+0x4c/0xd0
> >>>>> [    4.903398]  call_on_irq_stack+0x20/0x4c
> >>>>> [    4.907338]  do_interrupt_handler+0x54/0x60
> >>>>> [    4.911538]  el1_interrupt+0x30/0x80
> >>>>> [    4.915130]  el1h_64_irq_handler+0x18/0x24
> >>>>> [    4.919244]  el1h_64_irq+0x78/0x7c
> >>>>> [    4.922659]  arch_cpu_idle+0x18/0x2c
> >>>>> [    4.926249]  do_idle+0xc4/0x150
> >>>>> [    4.929404]  cpu_startup_entry+0x28/0x60
> >>>>> [    4.933343]  rest_init+0xe4/0xf4
> >>>>> [    4.936584]  arch_call_rest_init+0x10/0x1c
> >>>>> [    4.940699]  start_kernel+0x600/0x640
> >>>>> [    4.944375]  __primary_switched+0xbc/0xc4
> >>>>> [    4.948402] ---[ end trace 940193047b35b311 ]---
> >>>>>
> >>>>> Initially I dismissed this as a warning that would probably be cleaned
> >>>>> up when we did more work on the TPM support for our product but we also
> >>>>> seem to be getting some new i2c issues and possibly a kernel stack
> >>>>> corruption that we've conflated with this TPM warning.
> >>>> Hi, sorry for late response. I've been moving my (home) office to
> >>>> a different location during last couple of weeks, and email has been
> >>>> piling up.
> >>>>
> >>>> What does dmidecode give you?
> >>>>
> >>>> More specific, I'm interested on DMI type 43:
> >>>>
> >>>> $ sudo dmidecode -t 43
> >>>> # dmidecode 3.4
> >>>> Getting SMBIOS data from sysfs.
> >>>> SMBIOS 3.4.0 present.
> >>>>
> >>>> Handle 0x004D, DMI type 43, 31 bytes
> >>>> TPM Device
> >>>> Vendor ID: INTC
> >>>> Specification Version: 2.0
> >>>> Firmware Revision: 600.18
> >>>> Description: INTEL
> >>>> Characteristics:
> >>>> Family configurable via platform software support
> >>>> OEM-specific Information: 0x00000000
> >>>>
> >>>> BR, Jarkko
> >>> This is an embedded ARM64 (Marvell CN9130 SoC) device so no BIOS. The
> >>> relevant snippet from the device tree is
> >>>
> >>>         tpm@1 {
> >>>                 compatible = "infineon,slb9670";
> >>>                 reg = <1>; /* Chip select 1 */
> >>>                 interrupt-parent = <&cp0_gpio2>;
> >>>                 interrupts = <30 IRQ_TYPE_LEVEL_LOW>;
> >>>                 spi-max-frequency = <31250000>;
> >>>         };
> >>>
> >>> and I can tell you that the specific TPM chip is an Infinieon
> >>> SLM9670AQ20FW1311XTMA1
> >> OK, you know what I own that chip in the form of LetsTrustTPM
> >> product.
> >>
> >> I have not used it a lot because of lack of time but I could try
> >> to reproduce the bug with that and RPi 3B, or at least see what
> >> happens with different hardware platform with the same TPM chip.
> > I'm not device tree expert but with my limited knowledge, I guess kwe
> > could add a quirk that uses of_machine_is_compatible(), to disable
> > IRQ's, i.e. base the policy on specific boards rather than specific
> > chips: [*]
> >
> > if (of_machine_is_compatible("marvell,cn9130")) {
> > dev_notice(dev, "disable interrupts");
> > interrupts = 0;
> > }
> >
> > [*] I looked up arch/arm64/boot/dts/marvell/cn9130.dtsi. I hope I picked
> > the correct file.
>
> The warning itself was resolved by bringing in a further change for the
> LTS branch[1]. There does still seem to be an issue with the interrupts
> actually working (same behaviour on mainline) but at least now there is
> no warning and no adverse downstream effects.
>
> In terms of device tree stuff to disable the interrupt I could simply
> remove the interrupt properties from the board DTS (I was doing this as
> a workaround before the correct fix was identified).

I think it would be a good call because if a product creator *wants*
interrupts they will know it. Thus, it is perfectly fine IMHO to
disable them in the board DTS. I.e. very different case from
PC/workstation computing.

BR, Jarkko