Re: [Patch v3 Part2 3/9] x86/microcode/intel: Fix collect_cpu_info() to reflect current microcode

From: Ashok Raj
Date: Wed Feb 01 2023 - 14:33:27 EST


On Wed, Feb 01, 2023 at 11:13:58AM -0800, Dave Hansen wrote:
> On 1/30/23 13:39, Ashok Raj wrote:
> > Currently collect_cpu_info() is only returning what was cached earlier
> > instead of reading the current revision from the proper MSR.
> >
> > Collect the current revision and report that value instead of reflecting
> > what was cached in the past.
> >
> > [TBD:
> > Need to change microcode/amd.c. I didn't quite follow the logic since
> > it reports the revision from the patch file, instead of reporting the
> > real PATCH_LEVEL MSR.
> >
> > Untested on AMD.
> > ]
>
> This thread is meandering a bit. I think it's because this changelog
> doesn't have a problem statement. It's hard to agree on a patch being a
> solution to anything if we haven't first agreed on the problem.
>
> What is the problem?

I alluded here.. But yes, clearly missed in the commit log.

https://lore.kernel.org/lkml/Y9mW7EiL%2FBpYFLWn@xxxxxxxxxxxxxxxxxxxxxxxxx/

Thomas alluded here https://lore.kernel.org/lkml/87y1pygiyf.ffs@tglx/
that error handling in __reload_late()::wait_for_siblings() code patch is
completely broken.

This is one that I "assumed" he was referring to, since all we need is to
update the current revision, but we end up depending on the behavior of
apply_microcode() and that might accidentally have some side effects.

Instead only call the collect_cpu_info() and allow that to update the
per-cpu revision instead. And there is no risk in performing that vs
accidentally letting it fall through with an apply_microcode() that might
have risks.

>
> What does this "fix"?

The code performs this delicate late-load dance to prevent sibling threads
to be quiet while performing the update.

At wait_for_siblings() when all threads arrive, then the sibling does the
apply_microcode() which seems wrong.