Re: New kernel warning after updating from LTS 5.15.110 to 5.15.112 (and 5.15.113)

From: Chris Packham
Date: Thu Jun 08 2023 - 16:39:15 EST


Hi Jarkko,

On 9/06/23 03:17, Jarkko Sakkinen wrote:
> On Wed Jun 7, 2023 at 7:15 PM EEST, Jarkko Sakkinen wrote:
>> On Wed Jun 7, 2023 at 12:04 AM EEST, Chris Packham wrote:
>>> Hi Jarkko,
>>>
>>> On 6/06/23 21:39, Jarkko Sakkinen wrote:
>>>> On Sun, 2023-05-28 at 23:42 +0000, Chris Packham wrote:
>>>>> Hi,
>>>>>
>>>>> We have an embedded product with an Infineon SLM9670 TPM. After updating
>>>>> to a newer LTS kernel version we started seeing the following warning at
>>>>> boot.
>>>>>
>>>>> [    4.741025] ------------[ cut here ]------------
>>>>> [    4.749894] irq 38 handler tis_int_handler+0x0/0x154 enabled interrupts
>>>>> [    4.756555] WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:159
>>>>> __handle_irq_event_percpu+0xf4/0x180
>>>>> [    4.765557] Modules linked in:
>>>>> [    4.768626] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.113 #1
>>>>> [    4.774747] Hardware name: Allied Telesis x250-18XS (DT)
>>>>> [    4.780080] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS
>>>>> BTYPE=--)
>>>>> [    4.787072] pc : __handle_irq_event_percpu+0xf4/0x180
>>>>> [    4.792146] lr : __handle_irq_event_percpu+0xf4/0x180
>>>>> [    4.797220] sp : ffff800008003e40
>>>>> [    4.800547] x29: ffff800008003e40 x28: ffff8000093951c0 x27:
>>>>> ffff80000902a9b8
>>>>> [    4.807716] x26: ffff800008fe8d28 x25: ffff8000094a62bd x24:
>>>>> ffff000001b92400
>>>>> [    4.814885] x23: 0000000000000026 x22: ffff800008003ec4 x21:
>>>>> 0000000000000000
>>>>> [    4.822053] x20: 0000000000000001 x19: ffff000002381200 x18:
>>>>> ffffffffffffffff
>>>>> [    4.829222] x17: ffff800076962000 x16: ffff800008000000 x15:
>>>>> ffff800088003b57
>>>>> [    4.836390] x14: 0000000000000000 x13: ffff8000093a5078 x12:
>>>>> 000000000000035d
>>>>> [    4.843558] x11: 000000000000011f x10: ffff8000093a5078 x9 :
>>>>> ffff8000093a5078
>>>>> [    4.850727] x8 : 00000000ffffefff x7 : ffff8000093fd078 x6 :
>>>>> ffff8000093fd078
>>>>> [    4.857895] x5 : 000000000000bff4 x4 : 0000000000000000 x3 :
>>>>> 0000000000000000
>>>>> [    4.865062] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
>>>>> ffff8000093951c0
>>>>> [    4.872230] Call trace:
>>>>> [    4.874686]  __handle_irq_event_percpu+0xf4/0x180
>>>>> [    4.879411]  handle_irq_event+0x64/0xec
>>>>> [    4.883264]  handle_level_irq+0xc0/0x1b0
>>>>> [    4.887202]  generic_handle_irq+0x30/0x50
>>>>> [    4.891229]  mvebu_gpio_irq_handler+0x11c/0x2a0
>>>>> [    4.895780]  handle_domain_irq+0x60/0x90
>>>>> [    4.899720]  gic_handle_irq+0x4c/0xd0
>>>>> [    4.903398]  call_on_irq_stack+0x20/0x4c
>>>>> [    4.907338]  do_interrupt_handler+0x54/0x60
>>>>> [    4.911538]  el1_interrupt+0x30/0x80
>>>>> [    4.915130]  el1h_64_irq_handler+0x18/0x24
>>>>> [    4.919244]  el1h_64_irq+0x78/0x7c
>>>>> [    4.922659]  arch_cpu_idle+0x18/0x2c
>>>>> [    4.926249]  do_idle+0xc4/0x150
>>>>> [    4.929404]  cpu_startup_entry+0x28/0x60
>>>>> [    4.933343]  rest_init+0xe4/0xf4
>>>>> [    4.936584]  arch_call_rest_init+0x10/0x1c
>>>>> [    4.940699]  start_kernel+0x600/0x640
>>>>> [    4.944375]  __primary_switched+0xbc/0xc4
>>>>> [    4.948402] ---[ end trace 940193047b35b311 ]---
>>>>>
>>>>> Initially I dismissed this as a warning that would probably be cleaned
>>>>> up when we did more work on the TPM support for our product but we also
>>>>> seem to be getting some new i2c issues and possibly a kernel stack
>>>>> corruption that we've conflated with this TPM warning.
>>>> Hi, sorry for late response. I've been moving my (home) office to
>>>> a different location during last couple of weeks, and email has been
>>>> piling up.
>>>>
>>>> What does dmidecode give you?
>>>>
>>>> More specific, I'm interested on DMI type 43:
>>>>
>>>> $ sudo dmidecode -t 43
>>>> # dmidecode 3.4
>>>> Getting SMBIOS data from sysfs.
>>>> SMBIOS 3.4.0 present.
>>>>
>>>> Handle 0x004D, DMI type 43, 31 bytes
>>>> TPM Device
>>>> Vendor ID: INTC
>>>> Specification Version: 2.0
>>>> Firmware Revision: 600.18
>>>> Description: INTEL
>>>> Characteristics:
>>>> Family configurable via platform software support
>>>> OEM-specific Information: 0x00000000
>>>>
>>>> BR, Jarkko
>>> This is an embedded ARM64 (Marvell CN9130 SoC) device so no BIOS. The
>>> relevant snippet from the device tree is
>>>
>>>         tpm@1 {
>>>                 compatible = "infineon,slb9670";
>>>                 reg = <1>; /* Chip select 1 */
>>>                 interrupt-parent = <&cp0_gpio2>;
>>>                 interrupts = <30 IRQ_TYPE_LEVEL_LOW>;
>>>                 spi-max-frequency = <31250000>;
>>>         };
>>>
>>> and I can tell you that the specific TPM chip is an Infinieon
>>> SLM9670AQ20FW1311XTMA1
>> OK, you know what I own that chip in the form of LetsTrustTPM
>> product.
>>
>> I have not used it a lot because of lack of time but I could try
>> to reproduce the bug with that and RPi 3B, or at least see what
>> happens with different hardware platform with the same TPM chip.
> I'm not device tree expert but with my limited knowledge, I guess kwe
> could add a quirk that uses of_machine_is_compatible(), to disable
> IRQ's, i.e. base the policy on specific boards rather than specific
> chips: [*]
>
> if (of_machine_is_compatible("marvell,cn9130")) {
> dev_notice(dev, "disable interrupts");
> interrupts = 0;
> }
>
> [*] I looked up arch/arm64/boot/dts/marvell/cn9130.dtsi. I hope I picked
> the correct file.

The warning itself was resolved by bringing in a further change for the
LTS branch[1]. There does still seem to be an issue with the interrupts
actually working (same behaviour on mainline) but at least now there is
no warning and no adverse downstream effects.

In terms of device tree stuff to disable the interrupt I could simply
remove the interrupt properties from the board DTS (I was doing this as
a workaround before the correct fix was identified).

[1] -
https://lore.kernel.org/linux-integrity/ac5b76af-87dc-b04d-6035-8eda8ba5ed12@xxxxxxxxxx/