Re: [PATCH v7 12/12] mm: multigenerational LRU: documentation

From: Yu Zhao
Date: Wed Feb 23 2022 - 16:23:08 EST


On Wed, Feb 23, 2022 at 3:58 AM Mike Rapoport <rppt@xxxxxxxxxx> wrote:
>
> On Mon, Feb 21, 2022 at 06:47:25PM -0700, Yu Zhao wrote:
> > On Mon, Feb 21, 2022 at 2:02 AM Mike Rapoport <rppt@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Feb 15, 2022 at 08:22:10PM -0700, Yu Zhao wrote:
> > > > > Please consider splitting "enable" and "features" attributes.
> > > >
> > > > How about s/Features/Components/?
> > >
> > > I meant to use two attributes:
> > >
> > > /sys/kernel/mm/lru_gen/enable for the main breaker, and
> > > /sys/kernel/mm/lru_gen/features (or components) for the branch breakers
> >
> > It's a bit superfluous for my taste. I generally consider multiple
> > items to fall into the same category if they can be expressed by a
> > type of array, and I usually pack an array into a single file.
> >
> > From your last review, I gauged this would be too overloaded for your
> > taste. So I'd be happy to make the change if you think two files look
> > more intuitive from user's perspective.
>
> I do think that two attributes are more user-friendly, but I don't feel
> strongly about it.
>
> > > > > As for the descriptions, what is the user-visible effect of these features?
> > > > > How different modes of clearing the access bit are reflected in, say, GUI
> > > > > responsiveness, database TPS, or probability of OOM?
> > > >
> > > > These remain to be seen :) I just added these switches in v7, per Mel's
> > > > request from the meeting we had. These were never tested in the field.
> > >
> > > I see :)
> > >
> > > It would be nice to have a description or/and examples of user-visible
> > > effects when there will be some insight on what these features do.
> >
> > How does the following sound?
> >
> > Clearing the accessed bit in large batches can theoretically cause
> > lock contention (mmap_lock), and if it happens the 0x0002 switch can
> > disable this feature. In this case the multigenerational LRU suffers a
> > minor performance degradation.
> > Clearing the accessed bit in non-leaf page table entries was only
> > verified on Intel and AMD, and if it causes problems on other x86
> > varieties the 0x0004 switch can disable this feature. In this case the
> > multigenerational LRU suffers a negligible performance degradation.
>
> LGTM
>
> > > > > > +:Debugfs interface: ``/sys/kernel/debug/lru_gen`` has the following
> > > > >
> > > > > Is debugfs interface relevant only for datacenters?
> > > >
> > > > For the moment, yes.
> > >
> > > And what will happen if somebody uses these interfaces outside
> > > datacenters? As soon as there is a sysfs intefrace, somebody will surely
> > > play with it.
> > >
> > > I think the job schedulers might be the most important user of that
> > > interface, but the documentation should not presume it is the only user.
> >
> > Other ideas are more like brainstorming than concrete use cases, e.g.,
> > for desktop users, these interface can in theory speed up hibernation
> > (suspend to disk); for VM users, they can again in theory support auto
> > ballooning. These niches are really minor and less explored compared
> > with the data center use cases which have been dominant.
> >
> > I was hoping we could focus on the essential and take one step at a
> > time. Later on, if there is additional demand and resource, then we
> > expand to cover more use cases.
>
> Apparently I was not clear :)
>
> I didn't mean that you should describe other use-cases, I rather suggested
> to make the documentation more neutral, e.g. using "a user writes to this
> file ..." instead of "job scheduler writes to a file ...". Or maybe add a
> sentence in the beginning of the "Data centers" section, for instance:
>
> Data centers
> ------------
>
> + A representative example of multigenerational LRU users are job
> schedulers.
>
> Data centers want to optimize job scheduling (bin packing) to improve
> memory utilizations. Job schedulers need to estimate whether a server

Yes, that makes sense. Will do. Thanks.