kernelci build/boot results on -rt kernels

From: Arnd Bergmann
Date: Thu Sep 22 2016 - 07:39:11 EST


I've tried out the v4.4.19 based kernel in three variations, to get a feeling
for what kind of regressions we add. Over a couple of days, I had a vanilla
v4.4.19 stable kernel tested, and then the -rt27 release on top of that,
with the default settings, and with a patch that forces CONFIG_PREEMPT_RT_FULL
to be enabled on all builds. Unfortunately the results are not directly
comparable, and the recent addition of the MIPS builds means that
we get a lot of extra warnings that are a bit distracting:

v4.4.19:
87 build warnings, 1 build failures, 0 boots failed, 166 passed
https://kernelci.org/build/arm-soc/kernel/v4.4-2147-g85184740541c/
https://kernelci.org/boot/all/job/arm-soc/kernel/v4.4-2147-g85184740541c/

v4.4.19-rt27 REBASE:
184 build warnings, 2 build failures, 2 boots failed, 523 passed
https://kernelci.org/build/arm-soc/kernel/v4.4-2498-gcf6c32575c8b/
https://kernelci.org/boot/all/job/arm-soc/kernel/v4.4-2498-gcf6c32575c8b/

v4.4.19-rt27 REBASE, CONFIG_PREEMPT_RT_FULL=y:
60 build warnings, 3 build failures, 8 boots failed, 61 passed
https://kernelci.org/build/arm-soc/kernel/v4.4-2499-gbb46b50a5130/
https://kernelci.org/boot/all/job/arm-soc/kernel/v4.4-2499-gbb46b50a5130/


Greg, I checked the warnings in v4.4.19. We have previously made sure that
Mark Brown's build bot has a clean build, but the kernelci build apparently
does not for v4.4. These are the commits you may want to backport to get
a clean build on x86 and ARM (all the other warnings are MIPS specific,
and have not been fixed upstream):

166c5a6ef765 ("gma500: remove annoying deprecation warning")
3610a2add393 ("mpssd: fix buffer overflow warning")
44eb0cb9620c ("drm/i915: Avoid pointer arithmetic in calculating plane surface offset")
260b31643691 ("mmc: dw_mmc: use resource_size_t to store physical address")
32844138e313 ("pinctrl: at91-pio4: use %pr format string for resource")
00affcac69c7 ("soc: qcom/spm: shut up uninitialized variable warning")
236dec051078 ("kconfig: tinyconfig: provide whole choice blocks to avoid warnings")
facc432faa59 ("net: simplify napi_synchronize() to avoid warnings")

With the same configurations on v4.4.19-rt27, a couple of build warnings
are introduced:

2098555 ("random: Make it work on rt")
drivers/hv/vmbus_drv.c:831:2: error: too few arguments to function 'add_interrupt_randomness'

6a40894 ("preempt-lazy: Add the lazy-preemption check to preempt_schedule()")
kernel/sched/core.c:3474:12: warning: 'preemptible_lazy' defined but not used [-Wunused-function]

ff1741a ("tty/serial/pl011: Make the locking work on RT")
drivers/tty/serial/amba-pl011.c: In function 'pl011_console_write':
include/linux/spinlock.h:370:95: warning: 'flags' may be used uninitialized in this function [-Wmaybe-uninitialized]

And with CONFIG_PREEMPT_RT, there are a few more warnings:

* PREEMPT_RT_FULL requires RT_MUTEX, which normally gets enabled through FUTEX,
but FUTEX may be disabled
kernel/softirq.c: In function 'softirq_check_pending_idle':
kernel/softirq.c:126:11: error: 'struct task_struct' has no member named 'pi_blocked_on'
kernel/locking/rtmutex_common.h: In function 'task_has_pi_waiters':
kernel/locking/rtmutex_common.h:61:14: error: 'struct task_struct' has no member named 'pi_waiters'
kernel/locking/rtmutex_common.h:67:85: error: 'struct task_struct' has no member named 'pi_waiters_leftmost'

* upstream driver bugs we normally don't warn about:
drivers/infiniband/ulp/ipoib/ipoib_ib.c:54:21: warning: 'pkey_mutex' defined but not used [-Wunused-variable]
drivers/rtc/rtc-m41t80.c:71:21: warning: 'm41t80_rtc_mutex' defined but not used [-Wunused-variable]

* something makes stacks grow in ARM allmodconfig, haven't check what happens
drivers/media/usb/cx231xx/cx231xx-i2c.c:518:1: warning: the frame size of 1048 bytes is larger than 1024 bytes [-Wframe-larger-than=]
drivers/media/usb/dvb-usb-v2/mxl111sf.c:935:1: warning: the frame size of 1032 bytes is larger than 1024 bytes [-Wframe-larger-than=]
drivers/media/usb/em28xx/em28xx-camera.c:194:1: warning: the frame size of 1048 bytes is larger than 1024 bytes [-Wframe-larger-than=]
drivers/media/usb/em28xx/em28xx-camera.c:299:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]
drivers/media/usb/pvrusb2/pvrusb2-eeprom.c:154:1: warning: the frame size of 1128 bytes is larger than 1024 bytes [-Wframe-larger-than=]

The boot failures on v4.4.19-rt27 REBASE are false positives, the configuration
was already broken in v4.4.19 but didn't get reported in time for the kernelci
mail.

We get a number of boot failures with CONFIG_PREEMPT_RT_FULL+CONFIG_PROVE_LOCKING,
not all of them all the time. I haven't looked at them in detail.

One example is:
[ 3.924111] BUG: scheduling while atomic: swapper/0/0/0x00000002
[ 3.924116] Modules linked in:
[ 3.924135] Preemption disabled at:[<c09ff2e4>] schedule_preempt_disabled+0x1c/0x20
[ 3.924137]
[ 3.924147] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.19-rt27-02499-gbb46b50a5130 #1
[ 3.924150] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
[ 3.924172] [<c0218a34>] (unwind_backtrace) from [<c0213fb8>] (show_stack+0x10/0x14)
[ 3.924189] [<c0213fb8>] (show_stack) from [<c0494aa0>] (dump_stack+0x7c/0x90)
[ 3.924207] [<c0494aa0>] (dump_stack) from [<c026aa78>] (__schedule_bug+0x68/0xb8)
[ 3.924219] [<c026aa78>] (__schedule_bug) from [<c09feca4>] (__schedule+0x364/0x3d8)
[ 3.924228] [<c09feca4>] (__schedule) from [<c09fed70>] (schedule+0x58/0xf4)
[ 3.924238] [<c09fed70>] (schedule) from [<c0a00478>] (rt_spin_lock_slowlock+0x1cc/0x324)
[ 3.924256] [<c0a00478>] (rt_spin_lock_slowlock) from [<c0a01c28>] (rt_read_lock+0x2c/0x3c)
[ 3.924278] [<c0a01c28>] (rt_read_lock) from [<c02c6f98>] (cpu_pm_enter+0x14/0x80)
[ 3.924294] [<c02c6f98>] (cpu_pm_enter) from [<c0244c90>] (tegra114_idle_power_down+0x1c/0x78)
[ 3.924309] [<c0244c90>] (tegra114_idle_power_down) from [<c084f3a4>] (cpuidle_enter_state+0xf4/0x2c0)
[ 3.924321] [<c084f3a4>] (cpuidle_enter_state) from [<c0285ee0>] (cpu_startup_entry+0x1b8/0x298)
[ 3.924333] [<c0285ee0>] (cpu_startup_entry) from [<c0e03ca8>] (start_kernel+0x3c4/0x3d0)
[ 3.924342] [<c0e03ca8>] (start_kernel) from [<80208090>] (0x80208090)


A number of other build seem to have trouble with the serial port input
in that specific configuration (CONFIG_PREEMPT_RT_FULL+CONFIG_PROVE_LOCKING), e.g.

# PYBOOT: userspace: at root shell
cat /proc/cmdline
[ 27.710068] ttyS0: 1 input overrun(s)
coc/cmne
/bin/sh: coc/cmne: not found
/ # uname -r
unme-r
/bin/sh: unme-r: not found
/ # cat /proc/cpuinfo
ca /proc/cpuinfo
/bin/sh: ca: not found
/ # dmesg -n 1
dmeg n 1
/bin/sh: dmeg: not found
/ # DMESG=$(readlink -f /bin/dmesg)
DMSG$(readlink -f /bin/dmesg)

or also this one:
/bin/sh: can't access tty; job control turned off
[ 19.347102] ttyAMA0: 1 input overrun(s)
export PS1="linaro-test (echo \$?)]# "

Arnd