Re: [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode

From: Ashok Raj
Date: Wed Aug 24 2022 - 23:29:31 EST


On Wed, Aug 24, 2022 at 09:27:55PM +0200, Borislav Petkov wrote:
> On Tue, Aug 23, 2022 at 11:13:13AM +0000, Ashok Raj wrote:
> > > > patch1:
> > > > fms3 <--- header FMS
> > > > ...
> > > > ext_sig:
> > > > fms1
> > > > fms2
> > > >
> > > > patch2: new
> > > > fms2 <--- header FMS
> > > >
> > > > Current code takes only fms3 and checks with patch2 fms2.
> > >
> > > So, find_matching_signature() does all the signatures matching and
> > > scanning already. If anything, that function should tell its callers
> > > whether the patch it is looking at - the fms2 one - should replace the
> > > current one or not.
> > >
> > > I.e., all the logic to say how strong a patch match is, should be
> > > concentrated there. And then the caller will do the according action.
> >
> > I updated the commit log accordingly. Basically find_matching_signature()
> > is only intended to find a CPU's sig/pf against a microcode image and not
> > intended to compare between two different images.
>
> Err, what?
>
> find_matching_signature() looks at fmt3 - your example above - and then
> goes and looks at ext_sig. Also your example above.

- sig = mc_saved_hdr->sig;
- pf = mc_saved_hdr->pf;

- if (find_matching_signature(data, sig, pf)) {

In the above example,

Old patch header rev is fms3
New patch header is fms2
Your CPU sig is fms2.

The code was taking mc_saved_hdr->sig (which is fms3) and looking in the
new patch (which only has fms2) now you will never find and end up in the
situation described, which is adding the new patch to the tail instead of
replacing patch1.

Also look at all the existing usages of the function and we *always* use
the fms of the CPU.

Bottom line, you never need to have more than 1 patch for a given CPU. The
list only exists since it could technically support multi-stepping in the
system.

>
> So you can teach that function to say with a *separate* return value
> "replace current patch with this new patch because this new patch is a
> better fit."

I didn't quite understand what needs to be taught.

FWIW, I rewrote the commit log with *lots* help from Dave. Hopefully the
full rewrite of the log can help jog memory about the purpose and why this
has survived all these years. Simply because the code still is capable of
picking the right microcode except in our internal testing it showed its
warts.

The new commit log below:

== Background ==

Microcode needs to be reapplied upon a resume from suspend flow. For this
purpose kernel saves a copy of the microcode after its loaded. Its possible
to build systems with two or more CPUs that have incompatible microcodes,
like if the CPUs have different stepping and each have separate microcode
files.

save_microcode_patch() is called when it comes time to load a new microcode
image into kernel. Its job is to either ADD this to the kernel cache list
if one didn't exist, or REPLACE if one was already saved in the list.
save_microcode_patch() is expected to search the saved list for the
specific CPU in the system, and replace it with the new incoming patch
being updated.

At any given time, there *must* be exactly one entry in the cache for a
given CPU in the system. If one isn't found, save_microcode_patch() just
adds this image to the list.

Lastly, microcode images have a CPU signature for compatible processors
that this image applies. The signature can be found in two places.

- The microcode header
- In the extended signature table in the image. This is located at the end
of the image itself.

for e.g. The image would consist of

microcode header
{ rev, date, checksum, sigx }
image
{ microcode image }

extended_signature_table.
sig1
sig2
sig3


When Intel releases microcode images, typically the most recent stepping is
listed in the main microcode header, and the additional (older) ones are
listed in the extended signature table.

== Problem ==
The kernel is supposed to match the CPUs signature (struct ucode_cpu_info)
with the existing patches in the microcode_cache list. But, instead it
tries to match the signature in the old microcode, against the one in the
saved patch list in microcode_cache.

This would seem to make perfect sense if both old and new had the same
signature listed in the microcode header. But if the current CPU is listed
in the extended signature (say like sig1, above), and the new patch had a
signature in the main header as sig2, and only sig2 in the new patch.
Current code would take sigx from the current image and compare with sig2.
this is a mismatch and yield no search results, and save_microcode_patch,
will keep both the old and new images.

== Impact ==

When find_patch() is called since the code only checks for entries in the
cache > what's loaded in the CPU, it will skip the one at the head (which
is the older revision) and find the one at the tail, which is the newer
revision.

-- Problem 1--
So what exactly is the problem? find_patch() will always find the correct
patch to update on resume. But we end up storing an extra image that would
never be required.

-- Problem 2 --
There was some internal version with support for microcode rollback, which
permits picking revisions that are older than currently loaded.
microcode_cache now has both the old and the new, in some situations it
selected the older patch instead of picking the newer revision at the end
of the list.

[Note, the code that surfaced the bug was purely for internal evaluation
purposes.]

Nevertheless this will just end up storing every patch that isn't required.
Kernel just needs to store the latest patch. Otherwise its a memory leak
that sits in kernel and never used. This is mostly harmless.

== Solution ==
Always search for an entry in the microcode_cache, with the interested CPU
signature instead of using the signature found in another microcode entry.

In fact every other usage of find_matching_microcode() is always called
with the CPU signature. This is the *only* instance that used the wrong
values to compare.