Re: Initial testing of MPAM patches

From: Carl Worth
Date: Fri Aug 18 2023 - 12:16:35 EST


James Morse <james.morse@xxxxxxx> writes:
> On 18/08/2023 15:13, Carl Worth wrote:
>
>> 1. Is there a way to query the MPAM PARTID for a particular resctrl group?
>
> Deliberately: no.

No problem at all. Thanks for the explanation.

> The general theme here is I don't trust user-space not to depend on
> any value exposed.

Fair point. I appreciate it.

>> I don't know how much an end user will care about PARTID values,
>> (so it's nice that the driver manages these implicitly), but for
>> me, while debugging this stuff, it would be nice to be able to
>> query them.
>
> This would only matter if you could somehow inspect the hardware - which you probably can.
> - but users of deployed systems can't.
>
> Sorry if this isn't the answer you want, but I'm trying to only publish patches to
> kernel.org that I intend to upstream in some form.

No, that's fine. Like you said, with me doing bringup, I'm in a special
case, and also like you said, I can hack things to give me what I need
in the meantime.

>> I know that PARTID 0 is treated as reserved by the code, but is cpu
>> 0 given any special treatment?
>
> No - can you reproduce this on the latest branch?

I will check.

>> 4. The current schemata allows for cache portion, but not cache capacity
>
> See KNOWN_ISSUES:
> | Only features that match what resctrl already supports are supported.
> | This is very deliberate.
...
>> Is this due to a limitation in mapping MPAM to the current resctrl
>> interface?
>
> It is. Getting feature parity with x86 is the critical path to getting this upstream.
> Supporting other bits of MPAM can come next - we'd need a discussion with Intel about how
> any changes should be done, so that they can support them too if they ever have similar
> features.
>
> This conversation can't happen until we have some support upstream.

Got it. This approach makes sense to me, and it's good for me to
understand what limitations exist in the current implementation and why.

>> 5. Linked-list corruption with missing cache entries in PPTT
>>
>> At one point, I tried booting with the MPAM ACPI table populated
>> for my L3 cache, but without the necessary entries in the PPTT ACPI
>> table. The driver fell over with linked-list corruption, halting
>> Linux boot. I'll follow up this report with more details.
>
> This kind of thing won't have seen much testing. Any details you can
> share would help!

Yeah, I figured as much. Since I can replicate this I don't think it
should be too hard for me to give in and root-cause this bug.

Thanks again for the quick response. I'll do my next testing against
more recent code, and I should be able to follow-up against some
specific patches for the couple of bugs I identified above and that I'll
look closer into.

Beyond that, I hope to be able to provide some Reviewed-by and Tested-by
soon.

I see that you've been going several rounds on the earlier portions of
this patch set, (the parts that refactor resctrl to prepare for
things). I trust that you've got that part of the process in hand?
Otherwise, let me know if there's anything I can do to help with there.

Again, I haven't been looking into details of those patches yet, just
testing to ensure they work, (and so far, the generic parts of resctrl
seem to be working just fine for me).

-Carl