Re: [PATCH v2 1/1] printk: suppress rcu stall warnings caused by slow console devices

From: Petr Mladek
Date: Mon Nov 15 2021 - 05:11:34 EST


On Fri 2021-11-12 11:08:33, Wander Costa wrote:
> On Fri, Nov 12, 2021 at 5:45 AM Petr Mladek <pmladek@xxxxxxxx> wrote:
> > A workaround, is to lower console_loglevel and show only the most
> > important messages. Sometimes, a reasonable solution is to ratelimit
> > repeated messages.
> >
> > Which brings the question. What is the motivation for this patch,
> > please?
> >
> > Is it motivated by a particular bug report?
> > Or does the experience shows that this report causes more harm than
> > good?
> >
> QA has a test case in which they need to load hundreds of SCSI devices,
> and they simulate it using the scsi_debug driver:

I think that SCSI devices were the first sinner who motivated the work
on console offloading here at SUSE.

> modprobe scsi_debug virtual_gb=1 add_host=2 num_tgts=600
>
> This dumps a bunch of messages to print and the serial console driver
> cannot keep up with the data rate, causing an RCU stall. The stall is reported
> in an IRQ context, then the ring buffer flush continues from there,
> and then it causes
> a soft lockup.

I usually suggest to reduce console_loglevel as a temporary solution.
But I am not sure if it is acceptable in QA.

It might be done only around this test. I mean something like:

CONSOLE_LOGLEVEL=`cat /proc/sys/kernel/printk`
IGNORE_LOGLEVEL=`cat /sys/module/printk/parameters/ignore_loglevel`
echo "3 4 1 7" >/proc/sys/kernel/printk
echo N >/sys/module/printk/parameters/ignore_loglevel

modprobe scsi_debug virtual_gb=1 add_host=2 num_tgts=600

echo $CONSOLE_LOGLEVEL >/proc/sys/kernel/printk
echo $IGNORE_LOGLEVEL >/sys/module/printk/parameters/ignore_loglevel


Where /proc/sys/kernel/printk is a horrible interface. The first
number is important. It defines the limit used for filtering messages.
The levels are defined in include/linux/kern_levels.h.

Best Regards,
Petr