Re: [PATCH v2 12/14] platform/x86/intel/ifs: Add current_batch sysfs entry

From: Thiago Macieira
Date: Sun Nov 13 2022 - 16:22:03 EST


On Sunday, 13 November 2022 08:58:17 PST Borislav Petkov wrote:
> * if f/m/s matches, you execute
>
> if still within the timeout, you return -EAGAIN from
> current_batch_store() to tell userspace, take a nap and try again.
>
> * if the f/m/s doesn't match you return -EINVAL to say, wrong filename,
> try the next one.

But if it matches but is corrupt or the HW fails to accept it, you also get an
error. So you now need to differentiate a failure to load a candidate file and
an attempt to load a non-candidate.

I'm assuming that the kernel would provide different error conditions for
those. But handling those in shell scripting is very difficult: you'd need to
start a subshell and parse stderr. It's MUCH easier in C, of course, but
incrementing a number is magnitudes easier in C than performing a globbing.

Either way, incrementing a number in shell is pretty easy too. The simplest
script with just the numbers would be:

i=0
while echo $i > /sys/devices/virtual/misc/intel_ifs_0/current_batch; do
test_all_cpus
done

It's four lines and does not need to know about where the scan files are
located, how they're named and if some files it may find are not to be used. But
I've hidden a lot of complexity in the test_all_cpus shell function, which
would be common to either solution of how we specify the batch to be loaded.

And this is part of my argument of why it's unlikely people will use their
shells to do this. That shell function is easily another 10 lines of
scripting, if it's meant to do its job properly. To make that easier, we've
developed two tools, one of them the OpenDCDiag tool I linked to, but both
just happen to be written in C and C++ instead of shell.

> For all Intel employees here on the thread, there's a world outside
> Intel and people do not talk (family model stepping) tuples like we do.
> All they wanna do is run their damn tests.

Indeed they do. I have personally been in contact with the few that will
represent over 90% of the deployment of this feature for the next few years.
They want this functionality to integrate with their existing health-check
scanning methodology. This is where OpenDCDiag comes in, because it does
integrate with their workflows, like logging. For another example, it obeys the
cpuset that the parent process may have set with sched_setaffinity() or alike
tools (taskset(1) or schedtool(8)). I have zero clue how to do that with shell
scripting.

Which actually means I am the maintainer of the tool that is going to be
driving 99% or more of all scans (that's why I was cc'ed in the submission). I
am your user.

I'm not saying I am the only user. I definitely want to see the best interface
so that others could write tools too if they want to. And I don't want there
to be a kludge that we need to keep compatibility with for a decade, or to set
a bad precedent. But I am giving you the constraints that I need to work under
and the kernel interface to support me. I am telling you this is very good
right now and your proposal makes it worse for me, not better, for little
apparent gain.

--
Thiago Macieira - thiago.macieira (AT) intel.com
Cloud Software Architect - Intel DCAI Cloud Engineering