Re: [EXT] Re: [RFC PATCH v2 0/2] Node migration between memory tiers

From: Huang, Ying
Date: Thu Jan 04 2024 - 01:07:25 EST


Srinivasulu Thanneeru <sthanneeru@xxxxxxxxxx> writes:

>> -----Original Message-----
>> From: Huang, Ying <ying.huang@xxxxxxxxx>
>> Sent: Wednesday, January 3, 2024 2:00 PM
>> To: Srinivasulu Thanneeru <sthanneeru@xxxxxxxxxx>
>> Cc: gregory.price <gregory.price@xxxxxxxxxxxx>; Srinivasulu Opensrc
>> <sthanneeru.opensrc@xxxxxxxxxx>; linux-cxl@xxxxxxxxxxxxxxx; linux-
>> mm@xxxxxxxxx; aneesh.kumar@xxxxxxxxxxxxx; dan.j.williams@xxxxxxxxx;
>> mhocko@xxxxxxxx; tj@xxxxxxxxxx; john@xxxxxxxxxxxxxx; Eishan Mirakhur
>> <emirakhur@xxxxxxxxxx>; Vinicius Tavares Petrucci
>> <vtavarespetr@xxxxxxxxxx>; Ravis OpenSrc <Ravis.OpenSrc@xxxxxxxxxx>;
>> Jonathan.Cameron@xxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Johannes
>> Weiner <hannes@xxxxxxxxxxx>; Wei Xu <weixugc@xxxxxxxxxx>
>> Subject: Re: [EXT] Re: [RFC PATCH v2 0/2] Node migration between memory
>> tiers
>>
>> CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless
>> you recognize the sender and were expecting this message.
>>
>>
>> Srinivasulu Thanneeru <sthanneeru@xxxxxxxxxx> writes:
>>
>> > Micron Confidential
>> >
>> >
>> >
>> > Micron Confidential
>> >> -----Original Message-----
>> >> From: Huang, Ying <ying.huang@xxxxxxxxx>
>> >> Sent: Wednesday, January 3, 2024 11:38 AM
>> >> To: Srinivasulu Thanneeru <sthanneeru@xxxxxxxxxx>
>> >> Cc: gregory.price <gregory.price@xxxxxxxxxxxx>; Srinivasulu Opensrc
>> >> <sthanneeru.opensrc@xxxxxxxxxx>; linux-cxl@xxxxxxxxxxxxxxx; linux-
>> >> mm@xxxxxxxxx; aneesh.kumar@xxxxxxxxxxxxx;
>> dan.j.williams@xxxxxxxxx;
>> >> mhocko@xxxxxxxx; tj@xxxxxxxxxx; john@xxxxxxxxxxxxxx; Eishan Mirakhur
>> >> <emirakhur@xxxxxxxxxx>; Vinicius Tavares Petrucci
>> >> <vtavarespetr@xxxxxxxxxx>; Ravis OpenSrc
>> <Ravis.OpenSrc@xxxxxxxxxx>;
>> >> Jonathan.Cameron@xxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Johannes
>> >> Weiner <hannes@xxxxxxxxxxx>; Wei Xu <weixugc@xxxxxxxxxx>
>> >> Subject: Re: [EXT] Re: [RFC PATCH v2 0/2] Node migration between
>> memory
>> >> tiers
>> >>
>> >> CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless
>> >> you recognize the sender and were expecting this message.
>> >>
>> >>
>> >> Srinivasulu Thanneeru <sthanneeru@xxxxxxxxxx> writes:
>> >>
>> >> > Micron Confidential
>> >> >
>> >> > Hi Huang, Ying,
>> >> >
>> >> > My apologies for wrong mail reply format, my mail client settings got
>> >> changed on my PC.
>> >> > Please find comments bellow inline.
>> >> >
>> >> > Regards,
>> >> > Srini
>> >> >
>> >> >
>> >> > Micron Confidential
>> >> >> -----Original Message-----
>> >> >> From: Huang, Ying <ying.huang@xxxxxxxxx>
>> >> >> Sent: Monday, December 18, 2023 11:26 AM
>> >> >> To: gregory.price <gregory.price@xxxxxxxxxxxx>
>> >> >> Cc: Srinivasulu Opensrc <sthanneeru.opensrc@xxxxxxxxxx>; linux-
>> >> >> cxl@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; Srinivasulu Thanneeru
>> >> >> <sthanneeru@xxxxxxxxxx>; aneesh.kumar@xxxxxxxxxxxxx;
>> >> >> dan.j.williams@xxxxxxxxx; mhocko@xxxxxxxx; tj@xxxxxxxxxx;
>> >> >> john@xxxxxxxxxxxxxx; Eishan Mirakhur <emirakhur@xxxxxxxxxx>;
>> Vinicius
>> >> >> Tavares Petrucci <vtavarespetr@xxxxxxxxxx>; Ravis OpenSrc
>> >> >> <Ravis.OpenSrc@xxxxxxxxxx>; Jonathan.Cameron@xxxxxxxxxx;
>> linux-
>> >> >> kernel@xxxxxxxxxxxxxxx; Johannes Weiner <hannes@xxxxxxxxxxx>; Wei
>> Xu
>> >> >> <weixugc@xxxxxxxxxx>
>> >> >> Subject: [EXT] Re: [RFC PATCH v2 0/2] Node migration between memory
>> >> tiers
>> >> >>
>> >> >> CAUTION: EXTERNAL EMAIL. Do not click links or open attachments
>> unless
>> >> >> you recognize the sender and were expecting this message.
>> >> >>
>> >> >>
>> >> >> Gregory Price <gregory.price@xxxxxxxxxxxx> writes:
>> >> >>
>> >> >> > On Fri, Dec 15, 2023 at 01:02:59PM +0800, Huang, Ying wrote:
>> >> >> >> <sthanneeru.opensrc@xxxxxxxxxx> writes:
>> >> >> >>
>> >> >> >> > =============
>> >> >> >> > Version Notes:
>> >> >> >> >
>> >> >> >> > V2 : Changed interface to memtier_override from adistance_offset.
>> >> >> >> > memtier_override was recommended by
>> >> >> >> > 1. John Groves <john@xxxxxxxxxxxxxx>
>> >> >> >> > 2. Ravi Shankar <ravis.opensrc@xxxxxxxxxx>
>> >> >> >> > 3. Brice Goglin <Brice.Goglin@xxxxxxxx>
>> >> >> >>
>> >> >> >> It appears that you ignored my comments for V1 as follows ...
>> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> https://lore.k/
>> %2F&data=05%7C02%7Csthanneeru%40micron.com%7Ce9e04d25ea7540100
>> cf308dc0c366eb1%7Cf38a5ecd28134862b11bac1d563c806f%7C0%7C0%7C63
>> 8398675187014390%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD
>> AiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
>> &sdata=k6J1wxcuHTwR9eoD9Yz137bkn6wt1L9zpf5YaOjoIqA%3D&reserved=0
>> >>
>> %2F&data=05%7C02%7Csthanneeru%40micron.com%7C3e5d38eb47be463c2
>> >>
>> 95c08dc0c229d22%7Cf38a5ecd28134862b11bac1d563c806f%7C0%7C0%7C63
>> >>
>> 8398590664228240%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD
>> >>
>> AiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
>> >>
>> &sdata=7fPxb1YYR2tZ0v2FB1vlXnMJFcI%2Fr9HT2%2BUD1MNUd%2FI%3D&re
>> >> served=0
>> >> >> ernel.org%2Flkml%2F87o7f62vur.fsf%40yhuang6-
>> >> >>
>> >>
>> desk2.ccr.corp.intel.com%2F&data=05%7C02%7Csthanneeru%40micron.com
>> >> >>
>> >>
>> %7C5e614e5f028342b6b59c08dbff8e3e37%7Cf38a5ecd28134862b11bac1d56
>> >> >>
>> >>
>> 3c806f%7C0%7C0%7C638384758666895965%7CUnknown%7CTWFpbGZsb3d
>> >> >>
>> >>
>> 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3
>> >> >>
>> >>
>> D%7C3000%7C%7C%7C&sdata=OpMkYCar%2Fv8uHb7AvXbmaNltnXeTvcNUTi
>> >> >> bLhwV12Fg%3D&reserved=0
>> >> >
>> >> > Thank you, Huang, Ying for pointing to this.
>> >> >
>> >>
>> https://lpc.ev/
>> %2F&data=05%7C02%7Csthanneeru%40micron.com%7Ce9e04d25ea7540100
>> cf308dc0c366eb1%7Cf38a5ecd28134862b11bac1d563c806f%7C0%7C0%7C63
>> 8398675187014390%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD
>> AiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
>> &sdata=%2F0AW8RYpTIa7%2FiScnkzmmTeAE9TYqjsuWWjTuxBPptk%3D&rese
>> rved=0
>> >>
>> ents%2Fevent%2F16%2Fcontributions%2F1209%2Fattachments%2F1042%2F1
>> >>
>> 995%2FLive%2520In%2520a%2520World%2520With%2520Multiple%2520Me
>> >>
>> mory%2520Types.pdf&data=05%7C02%7Csthanneeru%40micron.com%7C3e
>> >>
>> 5d38eb47be463c295c08dc0c229d22%7Cf38a5ecd28134862b11bac1d563c806
>> >>
>> f%7C0%7C0%7C638398590664228240%7CUnknown%7CTWFpbGZsb3d8eyJW
>> >>
>> IjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3
>> >>
>> 000%7C%7C%7C&sdata=1fGraxff7%2F1hNaE0an0xEudSKSUvaF3HgClMkmdC7
>> >> n8%3D&reserved=0
>> >> >
>> >> > In the presentation above, the adistance_offsets are per memtype.
>> >> > We believe that adistance_offset per node is more suitable and flexible.
>> >> > since we can change it per node. If we keep adistance_offset per
>> memtype,
>> >> > then we cannot change it for a specific node of a given memtype.
>> >> >
>> >> >> >>
>> >> >>
>> >>
>> https://lore.k/
>> %2F&data=05%7C02%7Csthanneeru%40micron.com%7Ce9e04d25ea7540100
>> cf308dc0c366eb1%7Cf38a5ecd28134862b11bac1d563c806f%7C0%7C0%7C63
>> 8398675187014390%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD
>> AiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
>> &sdata=k6J1wxcuHTwR9eoD9Yz137bkn6wt1L9zpf5YaOjoIqA%3D&reserved=0
>> >>
>> %2F&data=05%7C02%7Csthanneeru%40micron.com%7C3e5d38eb47be463c2
>> >>
>> 95c08dc0c229d22%7Cf38a5ecd28134862b11bac1d563c806f%7C0%7C0%7C63
>> >>
>> 8398590664228240%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD
>> >>
>> AiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
>> >>
>> &sdata=7fPxb1YYR2tZ0v2FB1vlXnMJFcI%2Fr9HT2%2BUD1MNUd%2FI%3D&re
>> >> served=0
>> >> >> ernel.org%2Flkml%2F87jzpt2ft5.fsf%40yhuang6-
>> >> >>
>> >>
>> desk2.ccr.corp.intel.com%2F&data=05%7C02%7Csthanneeru%40micron.com
>> >> >>
>> >>
>> %7C5e614e5f028342b6b59c08dbff8e3e37%7Cf38a5ecd28134862b11bac1d56
>> >> >>
>> >>
>> 3c806f%7C0%7C0%7C638384758666895965%7CUnknown%7CTWFpbGZsb3d
>> >> >>
>> >>
>> 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3
>> >> >>
>> >>
>> D%7C3000%7C%7C%7C&sdata=O0%2B6T%2FgU0TicCEYBac%2FAyjOLwAeouh
>> >> >> D%2BcMI%2BflOsI1M%3D&reserved=0
>> >> >
>> >> > Yes, memory_type would be grouping the related memories together as
>> >> single tier.
>> >> > We should also have a flexibility to move nodes between tiers, to
>> address
>> >> the issues.
>> >> > described in use cases above.
>> >>
>> >> We don't pursue absolute flexibility. We add necessary flexibility
>> >> only. Why do you need this kind of flexibility? Can you provide some
>> >> use cases where memory_type based "adistance_offset" doesn't work?
>> >
>> > - /sys/devices/virtual/memory_type/memory_type/ adistance_offset
>> > memory_type based "adistance_offset will provide a way to move all nodes
>> of same memory_type (e.g. all cxl nodes)
>> > to different tier.
>>
>> We will not put the CXL nodes with different performance metrics in one
>> memory_type. If so, do you still need to move one of them?
>
> From https://lpc.events/event/16/contributions/1209/attachments/1042/1995/Live%20In%20a%20World%20With%20Multiple%20Memory%20Types.pdf
> abstract_distance_offset: override by users to deal with firmware issue.
>
> say firmware can configure the cxl node into wrong tiers, similar to
> that it may also configure all cxl nodes into single memtype, hence
> all these nodes can fall into a single wrong tier.
> In this case, per node adistance_offset would be good to have ?

I think that it's better to fix the error firmware if possible. And
these are only theoretical, not practical issues. Do you have some
practical issues?

I understand that users may want to move nodes between memory tiers for
different policy choices. For that, memory_type based adistance_offset
should be good.

> --
> Srini
>> > Whereas /sys/devices/system/node/node2/memtier_override provide a
>> way migrate a node from one tier to another.
>> > Considering a case where we would like to move two cxl nodes into two
>> different tiers in future.
>> > So, I thought it would be good to have flexibility at node level instead of at
>> memory_type.

--
Best Regards,
Huang, Ying