Re: [PATCH 0/5] x86/dumpstack: Cleanups and user opcode bytes Code: section

From: Eric W. Biederman
Date: Fri Feb 23 2018 - 15:13:38 EST


Josh Poimboeuf <jpoimboe@xxxxxxxxxx> writes:

> On Thu, Feb 22, 2018 at 10:42:52AM -0800, Linus Torvalds wrote:
>> So what we could perhaps do is:
>>
>> - make console_verbose() actually reset things to at least LOGLEVEL_DEBUG
>>
>> - make sure the *default* loglevel be LOGLEVEL_WARNING
>>
>> - now you can use pr_debug() in the oops code to print messages to
>> the log, but they won't be printed to the screen.
>>
>> And people who really want everything can still set a loglevel that is
>> much higher, because "console_verbose" would only do that "at least"
>> thing.
>>
>> That would seem like the best of both worlds, no?
>
> Maybe.
>
> Broadly speaking, I think our goal should be, in the worst case, to try
> to ensure that the essential data is captured.
>
> But the definitions of "worst case" and "essential data" can vary a lot,
> depending on both the user's setup and the nature of the bug. We're not
> going to be able to get it right 100% of the time.
>
> You're assuming the worst case of
>
> "an 80x25 screen is the only interface to the console".
>
> But there's another worst case of
>
> "we had unlimited serial port logging, but didn't dump enough data".
>
> With your proposal, the latter might instead become:
>
> "we had unlimited serial port logging, but didn't dump enough data
> because the default loglevel was too low."
>
> I did a little analysis of panics reported on lkml via .jpg files
> (either attached or in a URL). In the last two years I only found 11
> such reports. (And only two of them were 25x80, the rest were at least
> 47 rows.)
>
> On the other hand, I found a *ton* of panics which were copy/pasted. It
> was way too many to count, but a rough guess is about one per day.
>
> So ~1.5% of bugs are reported via cell phone camera (with only about
> 5-10% of *those* on a tiny 25x80 screen, with the rest having at least
> 47 rows).
>
> It's not very scientific, but it gives a general idea, I think. The
> cell phone camera thing has become a pretty rare way to report bugs, and
> with the proliferation of virtualization and automated testing I would
> expect that trend to continue.
>
> So my worry with your proposal is that many (or most?) people won't
> change their default log level to DEBUG, and then all these nice
> additional bits of data we're adding won't ever get printed, making
> debug harder for the ~98.5% case of sane serial port logging.
>
> But then again, I don't have any better ideas...

Please also note there are serial ports and there are serials ports.
There are serial ports on virtual machines that don't have a speed.
Then there are serial ports on physical hardware some at 9600 baud,
and in all cases they are slow. So on a physical serial port tersness
is a virtue (unless the machine is completely dead).

Then we have panics and the like that are reported by kdump. Those
should be cut and pastable as well. But require that someone has done
the work to set that up so that is a reliable path.

I know that in working on kexec-on-panic what I have found is the less
code in a critical path you have to run in a b0rken kernel the higher
your chance of that code running successfully. I expect that applies
to the panic printer as much as anything else.

Eric