Re: [PATCH v5 2/2] printk: console: Remove sysrq exception

From: Petr Mladek
Date: Tue May 02 2023 - 06:29:14 EST


On Sun 2023-04-30 19:00:42, calumlikesapplepie@xxxxxxxxx wrote:
> > > --- a/drivers/tty/sysrq.c
> > > +++ b/drivers/tty/sysrq.c
> > > @@ -101,12 +102,26 @@ __setup("sysrq_always_enabled",
> > > sysrq_always_enabled_setup);
> > >  static void sysrq_handle_loglevel(int key)
> > >  {
> > >         int i;
> > > +       int cookie;
> > > +       int warned = 0;
> > > +       struct console *con;
> > >  
> > >         i = key - '0';
> > >         console_loglevel = CONSOLE_LOGLEVEL_DEFAULT;
> > >         pr_info("Loglevel set to %d\n", i);
> > >         console_loglevel = i;
> > > +
> > > +       cookie = console_srcu_read_lock();
> > > +       for_each_console_srcu(con) {
> > > +               if (!warned && per_console_loglevel_is_set(con)) {
> > > +                       warned = 1;
> > > +                       pr_warn("Overriding per-console loglevel
> > > from sysrq\n");
> > > +               }
> > > +               con->level = -1;
> >
> > Please, use WRITE_ONCE(con->level, -1) to make sure that it will be
> > atomic.
> >
> > READ_ONCE()/WRITE_ONCE() should always be used when the values
> > are read/written using RCU synchronization. Otherwise the compiler
> > might do some optimizations and read/write bytes separately.
> >
>
> I would argue we remove this whole logic, and just go to a nice and
> simple printk at KERN_EMERG priority. Why add interactions with the
> SRCU subsystem in code that we need to run even in the event of some
> massive SRCU bug? This code will be both infrequently tested and make
> a bad day worse if it's problematic.

Any bug in SRCU should not break this sysrq code.
console_srcu_read_lock()/unlock() are non-blocking operations.

The problem might be in a path calling synchronize_srcu()
but it is not sysrq case, definitely.

> > > +       }
> > > +       console_srcu_read_unlock(cookie);
> > >  }
> >
> > Also we should safe/set/restore "ignore_per_console_loglevel"
> > in __handle_sysrq(). It already does the same with
> > "console_loglevel".
> >
> > __handle_sysrq() increases the loglevel to show all
> > messages printed by the sysrq handler on all consoles.
> > Many handlers print some information that might be useful
> > for debugging.
>
> Actually, it doesn't! Be kinda cool if it did, but there is a
> legitimate reason for it not to: some sysrq messages, like the list of
> all processes in the system, are extremely long, and so you might want
> to print them to a file but not every console. Instead, it prints
> exactly one message with the increased log level visibility: an info-
> priority message that states the "action message" of the sysrq, before
> returning to the default log level.

Ah, I forgot and misread the code. It really prints only the single
line with updated console loglevel.

> In other words: magic sysrq messages are printed to every console,
> because they are deemed to be important emergencies that no user should
> miss, even if they told the kernel it better be actively on fire if it
> wants to talk. Cool. However, they are not marked as emergency
> priority. Instead, they're marked as info priority, but do a dance
> with kernel log level to ensure they get printed everywhere.

I have different theory why it works this way, see below.


> So I dove into git history, only to find that the log-level shifting code was
> added with the initial import from bitkeeper.
[...]
> Sysrq.c does not exist in the previous kernel version.
>
> https://mirrors.edge.kernel.org/pub/linux/kernel/v2.1/linux-2.1.43.tar.gz

It seems that the original code actually printed all messages with
the updated console_loglevel.

It was modified to printk only the first line or help by:

commit 2433aae9cbfbe77b5c5af11e6174d390e06053a6
Author: linus1 <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Sun Sep 23 11:00:00 2001 -0600

v2.4.10.1 -> v2.4.10.2

- me/Al Viro: fix bdget() oops with block device modules that don't
clean up after they exit
- Alan Cox: continued merging (drivers, license tags)
- David Miller: sparc update, network fixes
- Christoph Hellwig: work around broken drivers that add a gendisk more
than once
- Jakub Jelinek: handle more ELF loading special cases
- Trond Myklebust: NFS client and lockd reclaimer cleanups/fixes
- Greg KH: USB updates
- Mikael Pettersson: sparate out local APIC / IO-APIC config options

, see
https://github.com/mpe/linux-fullhistory/commit/2433aae9cbfbe77b5c5af11e6174d390e06053a6

I do not see any explanation for this. I guess that the too long
output caused some problems.

> No, I could not find the email where this patch was first submitted.
> It may not exist on the internet; it certainly doesn't in the parts
> where Google looks.
>
> Now, printk did support priority levels at this point: without the
> emails discussing the patch, therefore, it's impossible to say why
> Martin decided all those years ago to use this technique. It's
> probably a moot point anyways.

I agree that it is pretty questionable. I am not sure about
the exact motivation.

Well, sysrq is primary used when the system is not responsive.
It is good to know that it is being handled by printing
the first line at least. And it is handy when it does this
out of box.

IMHO, the debug output is not printed because it might be too
much for slow consoles. It might actually be an advantage to
distinguish the log level of the various messages. It allows
to filter the messages on the console.

Note that the console loglevel could be set by sysrq-0 .. 9.

One use case might be to print the debug messages into
the internal log buffer and then trigger crash to
produce crash dump. It would allow to see the log even
when it is not shown to the console. And the log might
be much easier way than digging the information from
the various structures in the crashdump.

> The fact that sysrq messages behave in this way isn't well documented,
> and is definitely harmful. Users might (rightfully) assume that if
> they set the log level to 1, they will not see any messages that are
> not of EMERGENCY or ALERT priority.

sysrq does not know in which state the system is. It might be
called even on a normally running system. Will the EMERGENCY
or ALERT be correct in that case?

> If your system experiences a sysrq, either you have some weird backup
> software that is using the wrong interface

Is there any backup system doing this? Or is it just some wild theory?

> , or someone with extremely
> privileged access to your system believes that there is something so
> fundamentally wrong with your system that they need to bypass the
> entirety of userspace and much of the kernel to get something done.

My understanding is that sysrq is primary used when userspace
does not longer work. IMHO, the original use-case was to
trigger it from the keyboard.

> Either of those situations are at least as important as a typo in a
> password for sudo; which is given a CRITICAL priority.
>
> Lets not add a pile of code in order to maintain a behavior that no
> sane userspace will be depending on, and which might even be causing
> bugs in sane userspaces. Like, for instance, systemd-journald deciding
> not to write out journals when I instruct my kernel to do an emergency
> sync.

Honestly, I am not sure what would be your preferred behavior.
It might be because I am not a native speaker. And the mail is
really long.

Is the problem that systemd-journald did not write the log?
Or is the problem that it did eat 15% CPU?

Eating 15% CPU looks like a bug. The fact that it did not write
anything might be because of the OOM situation. Most things get
blocked when there is no memory.

What exact sysrq behavior would you suggest, please?


> > 4. Add ignore_per_console_loglevel parameter, use it
> >    in per_console_loglevel_is_set(), do_syslog(),
> >    and __handle_sysrq().
>
> In other words: sysrq's use of the printk subsystem in this way is
> unique, and thus almost certainly a bad idea.

sysrq is very old interface. Various people might expect different
behavior depending on the use case. It might be impossible to
make all people happy.

Changing the default behavior a significant way might be seen as a regression.
Especially, printing all messages with EMERGENCY loglevel looks like
a pretty bad idea because it would prevent any filtering on the
console level.

My feeling is that your primary problem is somewhere else,
systemd-journald or OOM behavior.

Best Regards,
Petr