Re: [PATCH] x86/mce/dev-mcelog: Call mce_register_decode_chain() much earlier

From: Luck, Tony
Date: Fri Aug 20 2021 - 10:43:19 EST


On Fri, Aug 20, 2021 at 02:28:45PM +0200, Borislav Petkov wrote:
> On Thu, Aug 19, 2021 at 03:44:52PM -0700, Tony Luck wrote:
> > which made sure that the logs were not lost completely by printing
> > to the console. But parsing console logs is error prone. Users
> > of /dev/mcelog should expect to find any early errors logged to
> > standard places.
>
> Yes, and for that matter, *all* consumers which register on the decoding
> chain should get a chance to look at those records...
>
> > Split the initialization code in dev-mcelog.c into:
> > 1) an "early" part that registers for mce notifications. Call this
> > directly from mcheck_init() because early_initcall() is still too late.
> > This allocation is too early for kzalloc() so use memblock_alloc().
> > 2) "late" part that registers the /dev/mcelog character device.
>
> ... but this looks like a hack to me: why aren't we adding those early
> records to the gen_pool and kick the work to consume them *only* *after*
> all consumers have been registered properly and everything is up and
> running?

How can the kernel tell that all consumers have registered? Is there
some new kernel crystal ball functionality that can predict that an
EDAC driver module is going to be loaded at some point in the future
when user space is up and running :-)

I think the best we could do would be to set a timer for some point
far enough out (one minute?, two minutes?) to give a chance for
modules to load. But this seems even more hacky ... I have no idea
how much time is enough? In this particular case we know that the
system crashed before ... maybe the file systems are going to need
a fsck(8) before modules are loaded?

-Tony