Re: [RFC PATCH v3 0/4] Node Weights and Weighted Interleave

From: Huang, Ying
Date: Thu Nov 02 2023 - 02:30:08 EST


Michal Hocko <mhocko@xxxxxxxx> writes:

> On Tue 31-10-23 12:22:16, Johannes Weiner wrote:
>> On Tue, Oct 31, 2023 at 04:56:27PM +0100, Michal Hocko wrote:
> [...]
>> > Is there any specific reason for not having a new interleave interface
>> > which defines weights for the nodemask? Is this because the policy
>> > itself is very dynamic or is this more driven by simplicity of use?
>>
>> A downside of *requiring* weights to be paired with the mempolicy is
>> that it's then the application that would have to figure out the
>> weights dynamically, instead of having a static host configuration. A
>> policy of "I want to be spread for optimal bus bandwidth" translates
>> between different hardware configurations, but optimal weights will
>> vary depending on the type of machine a job runs on.
>
> I can imagine this could be achieved by numactl(8) so that the process
> management tool could set this up for the process on the start up. Sure
> it wouldn't be very dynamic after then and that is why I was asking
> about how dynamic the situation might be in practice.
>
>> That doesn't mean there couldn't be usecases for having weights as
>> policy as well in other scenarios, like you allude to above. It's just
>> so far such usecases haven't really materialized or spelled out
>> concretely. Maybe we just want both - a global default, and the
>> ability to override it locally. Could you elaborate on the 'get what
>> you pay for' usecase you mentioned?
>
> This is more or less just an idea that came first to my mind when
> hearing about bus bandwidth optimizations. I suspect that sooner or
> later we just learn about usecases where the optimization function
> maximizes not only bandwidth but also cost for that bandwidth. Consider
> a hosting system serving different workloads each paying different
> QoS.

I don't think pure software solution can enforce the memory bandwidth
allocation. For that, we will need something like MBA (Memory Bandwidth
Allocation) as in the following URL,

https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-to-memory-bandwidth-allocation.html

At lease, something like MBM (Memory Bandwidth Monitoring) as in the
following URL will be needed.

https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-to-memory-bandwidth-monitoring.html

The interleave solution helps the cooperative workloads only.

> Do I know about anybody requiring that now? No! But we should really
> test the proposed interface for potential future extensions. If such an
> extension is not reasonable and/or we can achieve that by different
> means then great.

--
Best Regards,
Huang, Ying