Re: [Patch v3 Part2 4/9] x86/microcode: Do not call apply_microcode() on sibling threads

From: Borislav Petkov
Date: Wed Feb 01 2023 - 17:41:11 EST


On Wed, Feb 01, 2023 at 02:21:18PM -0800, Dave Hansen wrote:
> That works great, unless T0 experiences an error. In that case, T0 will
> jump out of __reload_late() after failing to do the update. T1 will
> come bumbling along after it and will enter ->apply_microcode(),
> blissfully unaware of T0's failure. T1 will assume that it is supposed
> to do T0's job, noting "rev < mc->hdr.rev". T1 will write the MSR while
> T0 is off doing god knows what.
>
> T1 should not even be attempting to do ->apply_microcode() because T0 is
> not quiescent.

Yah, thanks for explaining properly.

So, if T0 fails, then we will say that it failed. The ->apply_microcode()
call on T1 was never meant to apply any microcode - just to update the
cached data.

Now, if T0 fails, then it doesn't matter what T1 does - you have a
bigger problem:

A subset of the cores is running with new microcode while other subset
with the old one. Now this is a shit situation I don't want to be in.

And I don't have a good way out of it.

Revert to the old patch? Maybe...

Retry to application on all again with the hope that it works this time?

What if some core touches a MSR being added with the new microcode
patch?

Late loading is a big PITA. As we've been preaching for a while now.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette