Re: [RFC] memory tiering: use small chunk size and more tiers

From: Huang, Ying
Date: Wed Nov 02 2022 - 04:28:57 EST


Michal Hocko <mhocko@xxxxxxxx> writes:

> On Wed 02-11-22 16:02:54, Huang, Ying wrote:
>> Michal Hocko <mhocko@xxxxxxxx> writes:
>>
>> > On Wed 02-11-22 08:39:49, Huang, Ying wrote:
>> >> Michal Hocko <mhocko@xxxxxxxx> writes:
>> >>
>> >> > On Mon 31-10-22 09:33:49, Huang, Ying wrote:
>> >> > [...]
>> >> >> In the upstream implementation, 4 tiers are possible below DRAM. That's
>> >> >> enough for now. But in the long run, it may be better to define more.
>> >> >> 100 possible tiers below DRAM may be too extreme.
>> >> >
>> >> > I am just curious. Is any configurations with more than couple of tiers
>> >> > even manageable? I mean applications have been struggling even with
>> >> > regular NUMA systems for years and vast majority of them is largerly
>> >> > NUMA unaware. How are they going to configure for a more complex system
>> >> > when a) there is no resource access control so whatever you aim for
>> >> > might not be available and b) in which situations there is going to be a
>> >> > demand only for subset of tears (GPU memory?) ?
>> >>
>> >> Sorry for confusing. I think that there are only several (less than 10)
>> >> tiers in a system in practice. Yes, here, I suggested to define 100 (10
>> >> in the later text) POSSIBLE tiers below DRAM. My intention isn't to
>> >> manage a system with tens memory tiers. Instead, my intention is to
>> >> avoid to put 2 memory types into one memory tier by accident via make
>> >> the abstract distance range of each memory tier as small as possible.
>> >> More possible memory tiers, smaller abstract distance range of each
>> >> memory tier.
>> >
>> > TBH I do not really understand how tweaking ranges helps anything.
>> > IIUC drivers are free to assign any abstract distance so they will clash
>> > without any higher level coordination.
>>
>> Yes. That's possible. Each memory tier corresponds to one abstract
>> distance range. The larger the range is, the higher the possibility of
>> clashing is. So I suggest to make the abstract distance range smaller
>> to reduce the possibility of clashing.
>
> I am sorry but I really do not understand how the size of the range
> actually addresses a fundamental issue that each driver simply picks
> what it wants. Is there any enumeration defining basic characteristic of
> each tier? How does a driver developer knows which tear to assign its
> driver to?

The smaller range size will not guarantee anything. It just tries to
help the default behavior.

The drivers are expected to assign the abstract distance based on the
memory latency/bandwidth, etc. And the abstract distance range of a
memory tier corresponds to a memory latency/bandwidth range too. So, if
the size of the abstract distance range is smaller, the possibility for
two types of memory with different latency/bandwidth to clash on
the abstract distance range is lower.

Clashing isn't a totally disaster. We plan to provide a per-memory-type
knob to offset the abstract distance provided by driver. Then, we can
move clashing memory types away if necessary.

Best Regards,
Huang, Ying