Re: [PATCH v7 12/12] mm: multigenerational LRU: documentation

From: Mike Rapoport
Date: Wed Feb 23 2022 - 05:58:48 EST


On Mon, Feb 21, 2022 at 06:47:25PM -0700, Yu Zhao wrote:
> On Mon, Feb 21, 2022 at 2:02 AM Mike Rapoport <rppt@xxxxxxxxxx> wrote:
> >
> > On Tue, Feb 15, 2022 at 08:22:10PM -0700, Yu Zhao wrote:
> > > > Please consider splitting "enable" and "features" attributes.
> > >
> > > How about s/Features/Components/?
> >
> > I meant to use two attributes:
> >
> > /sys/kernel/mm/lru_gen/enable for the main breaker, and
> > /sys/kernel/mm/lru_gen/features (or components) for the branch breakers
>
> It's a bit superfluous for my taste. I generally consider multiple
> items to fall into the same category if they can be expressed by a
> type of array, and I usually pack an array into a single file.
>
> From your last review, I gauged this would be too overloaded for your
> taste. So I'd be happy to make the change if you think two files look
> more intuitive from user's perspective.

I do think that two attributes are more user-friendly, but I don't feel
strongly about it.

> > > > As for the descriptions, what is the user-visible effect of these features?
> > > > How different modes of clearing the access bit are reflected in, say, GUI
> > > > responsiveness, database TPS, or probability of OOM?
> > >
> > > These remain to be seen :) I just added these switches in v7, per Mel's
> > > request from the meeting we had. These were never tested in the field.
> >
> > I see :)
> >
> > It would be nice to have a description or/and examples of user-visible
> > effects when there will be some insight on what these features do.
>
> How does the following sound?
>
> Clearing the accessed bit in large batches can theoretically cause
> lock contention (mmap_lock), and if it happens the 0x0002 switch can
> disable this feature. In this case the multigenerational LRU suffers a
> minor performance degradation.
> Clearing the accessed bit in non-leaf page table entries was only
> verified on Intel and AMD, and if it causes problems on other x86
> varieties the 0x0004 switch can disable this feature. In this case the
> multigenerational LRU suffers a negligible performance degradation.

LGTM

> > > > > +:Debugfs interface: ``/sys/kernel/debug/lru_gen`` has the following
> > > >
> > > > Is debugfs interface relevant only for datacenters?
> > >
> > > For the moment, yes.
> >
> > And what will happen if somebody uses these interfaces outside
> > datacenters? As soon as there is a sysfs intefrace, somebody will surely
> > play with it.
> >
> > I think the job schedulers might be the most important user of that
> > interface, but the documentation should not presume it is the only user.
>
> Other ideas are more like brainstorming than concrete use cases, e.g.,
> for desktop users, these interface can in theory speed up hibernation
> (suspend to disk); for VM users, they can again in theory support auto
> ballooning. These niches are really minor and less explored compared
> with the data center use cases which have been dominant.
>
> I was hoping we could focus on the essential and take one step at a
> time. Later on, if there is additional demand and resource, then we
> expand to cover more use cases.

Apparently I was not clear :)

I didn't mean that you should describe other use-cases, I rather suggested
to make the documentation more neutral, e.g. using "a user writes to this
file ..." instead of "job scheduler writes to a file ...". Or maybe add a
sentence in the beginning of the "Data centers" section, for instance:

Data centers
------------

+ A representative example of multigenerational LRU users are job
schedulers.

Data centers want to optimize job scheduling (bin packing) to improve
memory utilizations. Job schedulers need to estimate whether a server


--
Sincerely yours,
Mike.