Re: watchdog: pcf2127: systemd fails on 5.11

From: Guenter Roeck
Date: Mon Feb 22 2021 - 17:45:04 EST


On Thu, Feb 18, 2021 at 01:35:36PM +0100, Bruno Thomsen wrote:
> Hi,
>
> After updating the kernel from 5.8.17 to 5.11 systemd (246.6) is
> unable to init watchdog in pcf2127 during boot. Kernel option
> CONFIG_WATCHDOG_OPEN_TIMEOUT=300 is working as expected.
> It's possible to get watchdog from userspace working in
> the following 2 ways.
> 1) Disable watchdog in systemd and use busybox watchdog.
> 2) Restart systemd after boot with "kill 1".
>
> During boot setting the system clock from RTC is working.
> RTC read/write from userland with hwclock is also working.
>
> DTS: imx7d-flex-concentrator-mfg.dts
> SOC: NXP i.MX7D
> Drivers: rtc-pcf2127, spi-imx
> Communication: SPI
>
> There are no patches applied to the kernel.
>
> When systemd changes watchdog timeout it receives an
> error that to our best knowledge comes from spi-imx[1].
>
> We suspect it's a race condition between drivers or
> incompatible error handling.
>
> Any help in investigating the issue is appreciated.
>
Difficult to say without access to hardware. The code does have a
potential problem, though: It calls pcf2127_wdt_ping not only from
watchdog code but also from various rtc related functions, but there
is not access protection. This is even more concerning because the ping
function is called from an interrupt handler. At the same time, the
watchdog initialization sets min_hw_heartbeat_ms to 500, which suggests
that there may be a minimum time between heartbeats (which is clearly
violated by the current code).

Guenter

> /Bruno
>
> [1] https://elixir.bootlin.com/linux/v5.11/source/drivers/spi/spi-imx.c#L1424
>
> Relevant boot trace of pcf2127 and systemd:
>
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: pcf2127_probe
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0:
> pcf2127_rtc_read_time: raw data is cr3=08, sec=37, min=26, hr=09,
> mday=18, wday=04, mon=02, year=21
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0:
> pcf2127_rtc_read_time: tm is secs=37, mins=26, hours=9, mday=18,
> mon=1, year=121, wday=4
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: char device (252:0)
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: registered as rtc0
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0:
> pcf2127_rtc_read_time: raw data is cr3=08, sec=37, min=26, hr=09,
> mday=18, wday=04, mon=02, year=21
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0:
> pcf2127_rtc_read_time: tm is secs=37, mins=26, hours=9, mday=18,
> mon=1, year=121, wday=4
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: setting system
> clock to 2021-02-18T09:26:37 UTC (1613640397)
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: I/O Error in PIO
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: SPI transfer failed: -110
> Feb 18 09:26:46 tqma7 kernel: spi_master spi1: failed to transfer one
> message from queue
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: I/O Error in PIO
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: SPI transfer failed: -110
> Feb 18 09:26:46 tqma7 kernel: spi_master spi1: failed to transfer one
> message from queue
> Feb 18 09:26:46 tqma7 systemd[1]: Hardware watchdog 'NXP
> PCF2127/PCF2129 Watchdog', version 0
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: new watchdog
> timeout: 120s (old: 60s)
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: I/O Error in PIO
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0: SPI transfer failed: -110
> Feb 18 09:26:46 tqma7 kernel: spi_master spi1: failed to transfer one
> message from queue
> Feb 18 09:26:46 tqma7 kernel: rtc-pcf2127-spi spi1.0:
> pcf2127_wdt_active_ping: watchdog restart failed, ret=-110
> Feb 18 09:26:46 tqma7 systemd[1]: Failed to set timeout to 120s:
> Connection timed out
>
>
> Relevant trace after systemd restart (kill 1):
>
> Feb 02 10:42:39 tqma7 kernel: watchdog: watchdog0: nowayout prevents
> watchdog being stopped!
> Feb 02 10:42:39 tqma7 kernel: watchdog: watchdog0: nowayout prevents
> watchdog being stopped!
> Feb 02 10:42:39 tqma7 kernel: watchdog: watchdog0: watchdog did not stop!
> Feb 02 10:42:39 tqma7 systemd[1]: systemd 246.6 running in system
> mode. (-PAM -AUDIT -SELINUX -IMA -APPARMOR -SMACK -SYSVINIT -UTMP
> -LIBCRYPTSETUP -GCRYPT -GNUTLS -ACL +XZ -LZ4 +ZSTD +SECCOMP +BLKID
> +ELFUTILS +KMOD -IDN2 -IDN -PCRE2 default-hierarchy=unified)
> Feb 02 10:42:39 tqma7 systemd[1]: Detected architecture arm.
> Feb 02 10:42:40 tqma7 systemd[1]: Hardware watchdog 'NXP
> PCF2127/PCF2129 Watchdog', version 0
> Feb 02 10:42:40 tqma7 kernel: rtc-pcf2127-spi spi1.0: new watchdog
> timeout: 30s (old: 30s)
> Feb 02 10:42:40 tqma7 systemd[1]: Set hardware watchdog to 30s.