Re: [PATCH RESEND 1/4] memory tiering: add abstract distance calculation algorithms management

From: Alistair Popple
Date: Thu Jul 27 2023 - 00:12:13 EST



"Huang, Ying" <ying.huang@xxxxxxxxx> writes:

> Alistair Popple <apopple@xxxxxxxxxx> writes:
>
>> "Huang, Ying" <ying.huang@xxxxxxxxx> writes:
>>
>>>>> And, I don't think that we are forced to use the general notifier
>>>>> chain interface in all memory device drivers. If the memory device
>>>>> driver has better understanding of the memory device, it can use other
>>>>> way to determine abstract distance. For example, a CXL memory device
>>>>> driver can identify abstract distance by itself. While other memory
>>>>> device drivers can use the general notifier chain interface at the
>>>>> same time.
>>>>
>>>> Whilst I think personally I would find that flexibility useful I am
>>>> concerned it means every driver will just end up divining it's own
>>>> distance rather than ensuring data in HMAT/CDAT/etc. is correct. That
>>>> would kind of defeat the purpose of it all then.
>>>
>>> But we have no way to enforce that too.
>>
>> Enforce that HMAT/CDAT/etc. is correct? Agree we can't enforce it, but
>> we can influence it. If drivers can easily ignore the notifier chain and
>> do their own thing that's what will happen.
>
> IMHO, both enforce HMAT/CDAT/etc is correct and enforce drivers to use
> general interface we provided. Anyway, we should try to make HMAT/CDAT
> works well, so drivers want to use them :-)

Exactly :-)

>>>>> While other memory device drivers can use the general notifier chain
>>>>> interface at the same time.
>>
>> How would that work in practice though? The abstract distance as far as
>> I can tell doesn't have any meaning other than establishing preferences
>> for memory demotion order. Therefore all calculations are relative to
>> the rest of the calculations on the system. So if a driver does it's own
>> thing how does it choose a sensible distance? IHMO the value here is in
>> coordinating all that through a standard interface, whether that is HMAT
>> or something else.
>
> Only if different algorithms follow the same basic principle. For
> example, the abstract distance of default DRAM nodes are fixed
> (MEMTIER_ADISTANCE_DRAM). The abstract distance of the memory device is
> in linear direct proportion to the memory latency and inversely
> proportional to the memory bandwidth. Use the memory latency and
> bandwidth of default DRAM nodes as base.
>
> HMAT and CDAT report the raw memory latency and bandwidth. If there are
> some other methods to report the raw memory latency and bandwidth, we
> can use them too.

Argh! So we could address my concerns by having drivers feed
latency/bandwidth numbers into a standard calculation algorithm right?
Ie. Rather than having drivers calculate abstract distance themselves we
have the notifier chains return the raw performance data from which the
abstract distance is derived.