Re: [PATCH 24/24] selftests/resctrl: Ignore failures from L2 CAT test with <= 2 bits

From: Reinette Chatre
Date: Fri Nov 03 2023 - 18:54:00 EST


Hi Ilpo,

On 11/3/2023 3:24 AM, Ilpo Järvinen wrote:
> On Thu, 2 Nov 2023, Reinette Chatre wrote:
>> On 10/24/2023 2:26 AM, Ilpo Järvinen wrote:
>>> L2 CAT test with low number of bits tends to occasionally fail because
>>> of what seems random variation. The margin is quite small to begin with
>>> for <= 2 bits in CBM. At times, the result can even become negative.
>>> While it would be possible to allow negative values for those cases, it
>>> would be more confusing to user.
>>>
>>> Ignore failures from the tests where <= 2 were used to avoid false
>>> negative results.
>>>
>>
>> I think the core message is that 2 or fewer bits should not be used. Instead
>> of running the test and ignoring the results the test should perhaps just not
>> be run.
>
> I considered that but it often does work so it felt shame to now present
> them when they're successful. Then I just had to decide how to deal with
> the cases where they failed.
>
> Also, if I make it to not run down to 1 bit, those numbers will never ever
> be seen by anyone. It doesn't say 2 and 1 bit results don't contain any
> information to a human reader who is able to do more informed decisions
> whether something is truly working or not. We could, hypothetically, have
> a HW issue one day which makes 1-bit L2 mask to misbehave and if the
> number is never seen by anyone, it's extremely unlikely to be caught
> easily.
>
> They are just reliable enough for simple automated threshold currently.
> Maybe something else than average value would be, it would need to be
> explored but I suspect also the memory address of the buffer might affect
> the value, with L3 it definitely should because of how the things work but
> I don't know if that holds for L2 too. I have earlier tried playing with
> the buffer addresses with L3 but as I didn't immediately yield positive
> outcome to guard against outliers, I postponed that investigation (e.g.,
> my alloc pattern might have been too straightforward and didn't provide
> enough entropy into the buffer start address because I just alloc'ed n x
> buf_size buffers back-to-back).
>
> But I don't have very strong opinion on this so if you prefer I just stop
> at 3 bits, I can change it?
>

We seem to have different users in mind when thinking about this. I was
considering the users that just run the selftest to get a pass/fail. You
seem to also consider folks using this for validation. I'm ok with keeping
this change to accommodate both.

Reinette