RE: [PATCH v2] platform/x86/intel/ifs: Allow non-default names for IFS image

From: Luck, Tony
Date: Thu Jul 28 2022 - 11:12:35 EST


>>> Speculating myself as far as I understand IFS is not for factory
>>> tests but for testing in the fields since big cloud vendors have
>>> found that sometimes there are hard to catch CPU defects which
>>> they only find out by running statistics which show that certain
>>> tasks only crash when run on machine a, socket b, core c.
>>
>> Who knows, Intel doesn't say so we can't really guess :(
>
>Right, for version 3 the commit message and ABI documentation changes
>really need to clarify why multiple test-pattern files may be needed
>mucy better. If possible please also include 1 or 2 _clear_ examples
>of cases where more then 1 test-pattern file may be used.

Sorry for the radio silence. We took Greg's suggestion to go back and
thinks this out completely to heart. As he said, there is no rush to get
this in. We need to do it right.

Your summary above on how this works is completely correct.

The reason for adding more files is to cover more transistors in the
core. The base file that we started with gets mumble-mumble percent
of the transistors checked. Adding a few more files will increase that
quite significantly.

So testing a system may look like:

for each scan file
do
load the scan file
for each core
do
test the core with this set of tests
done
done

Our internal discussions on naming are following the same direction that
you suggested, but likely even more restrictive. The "suffix" may just be
a two-digit hex number (allowing for up to 256 files ... though for Sapphire
Rapids we are looking at just 6).

So our current direction is to name six "parts" something like this:

06-8f-06-00.scan
06-8f-06-01.scan
06-8f-06-02.scan
06-8f-06-03.scan
06-8f-06-04.scan
06-8f-06-05.scan

but we are still checking to make sure this will work for future CPUs. Once
we have something solid we will come back to the mailing list.

As also suggested in earlier thread we will change the name of the "reload"
file (since skipping to a new file isn't a "reload"). The "load a scan file" will
write the "part" number to this new file.

-Tony