Re: [regression] microcode files missing in initramfs imgages from dracut (was Re: [PATCH] x86: Clean up remaining references to CONFIG_MICROCODE_AMD)

From: Borislav Petkov
Date: Wed Nov 22 2023 - 10:58:27 EST


Sure,

lemme do that.

Hi Linus,

we have a disagreement on what is a userspace regression and what is
not.

The whole thread starts here:

https://lore.kernel.org/r/c67bd324-cec0-4fe4-b3b1-fc1d1e4f2967@xxxxxxxxxxxxx

and I'm leaving Thorsten's arguments fully quoted below for more
context.

Basically, dracut has been grepping the kernel's .config to figure out
whether to add microcode blobs to the intird or not.

Now, we changed a CONFIG and it broke. Again. It wasn't the first time.

It went and fixed it this way:

https://github.com/dracutdevs/dracut/commit/6c80408c8644a0add1907b0593eb83f90d6247b1

which will break next time we change stuff.

IMO, yes, we should not break userspace but dracut is special. And it
parses willy nilly kernel internals which are not ABI to begin with.

Looking at that dracut function check_kernel_config(), it does:

# no kernel config file, so return true
[[ $_config_file ]] || return 0

if it can't find a kernel .config at the two places it looks for which
is just silly: if it can't find a .config just return true and include
those microcode blobs. Might as well hide the config as a fix. :-)

What it should do, is parse the .notes section of vmlinux for which
I have a proper fix:

https://lore.kernel.org/r/20231122132419.GBZV4BA399sG2JRFAJ@fat_crate.local

So IMNSVHO, CONFIG symbols are not an ABI.

If there's some other userspace tool which goes and greps the kernel
sources and looks for a particular function or symbol which is not even
exported, does that mean that we won't be able to change that function
name or symbol anymore just because some random tool touches it.

Yes, I know, we should not break userspace but there has to be some
sensible limit somewhere as to what constitutes a userspace breakage.

In the end of the day, that's your call.

If we consider this a userspace breakage, I would add back those
CONFIG_MICROCODE_INTEL and CONFIG_MICROCODE_AMD Kconfig symbols and
everytime I add a new CONFIG symbol, I should probably write a big fat
note above it that userspace should not rely on it existing forever...

Thx.

On Wed, Nov 22, 2023 at 04:34:03PM +0100, Linux regression tracking (Thorsten Leemhuis) wrote:
> Preface: considered CCing Linus here, as it's quite possible that I'm
> wrong, as every situation is somewhat different. If anybody disagrees
> with what I bring up below to hopefully clarify things thus please do me
> a favor an CC Linus so he can clarify things.
>
> Ohh, and sorry for being a PITA. I hate that, but when it comes to
> regressions disagreements often happen, as all those discussions linked
> at the end of https://docs.kernel.org/process/handling-regressions.html
> illustrate.
>
> On 22.11.23 12:58, Borislav Petkov wrote:
> > On Wed, Nov 22, 2023 at 10:15:42AM +0100, Linux regression tracking (Thorsten Leemhuis) wrote:
> >> [1] unless you fiddle with things obviously internal; not sure if this
> >> case would qualify for him, but somehow I doubt it -- but I might be
> >> wrong there.
> >
> > Well, think about it - by that logic, if CONFIG_* items are an ABI, we
> > will never ever be able to change any of them. [...]
>
> Can't follow your logic (or the one from Lukas in the other reply), as
> what's an ABI (or an API) is afaik not the important factor when it
> comes to the "no regressions" rule: you can change things (including
> ABIs or APIs) all you want, as long as nothing breaks. To quote Linus from
> https://lore.kernel.org/all/CAHk-=wiVi7mSrsMP=fLXQrXK_UimybW=ziLOwSzFTtoXUacWVQ@xxxxxxxxxxxxxx/
>
> ```
> The rules about regressions have never been about any kind of
> documented behavior, or where the code lives.
>
> The rules about regressions are always about "breaks user workflow".
>
> The other side of the coin is that people who talk about "API
> stability" are entirely wrong. API's don't matter either. You can make
> any changes to an API you like - as long as nobody notices.
>
> Again, the regression rule is not about documentation, not about
> API's, and not about the phase of the moon.
>
> It's entirely about "we caused problems for user space that used to work".
> ```
>
> >> BTW: I see that this could help preventing problems like the current one
> >> to happen in the far future. But how would that help the current
> >> situation (e.g. users that have an old dracut and updated the kernel
> >> without updating dracut)?
> > Update dracut too?
>
> To quote Linus again, this time from
> https://lore.kernel.org/lkml/CA+55aFxW7NMAMvYhkvz1UPbUTUJewRt6Yb51QAx5RtrWOwjebg@xxxxxxxxxxxxxx/
>
> ```
> People should basically always feel like they can update their kernel
> and simply not have to worry about it.
>
> I refuse to introduce "you can only update the kernel if you also
> update that other program" kind of limitations. If the kernel used to
> work for you, the rule is that it continues to work for you.
>
> There have been exceptions, but they are few and far between,
> [...]
> But if something actually breaks, then the change must get fixed or
> reverted. And it gets fixed in the *kernel*. Not by saying "well, fix
> your user space then".
> ```
>
> Are those quotes fitting to the situation at hand? Not totally sure.
> Initramfs generators might be special and we have done exceptions for
> them in the past if no other solution could be found to prevent a
> regression[1]. We'd need Linus to clarify.
>
> Ciao, Thorsten
>
> [1] maybe it's a naive idea, but can't we just avoid the problem at hand
> by adding CONFIG_MICROCODE_AMD and CONFIG_MICROCODE_INTEL back as a
> hidden config stub and remove those in ~3 years? Yeah, ugly, but we have
> done things way more ugly than that to prevent regressions.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette