Re: [PATCH 2/7] mm: shrinker: Add a .to_text() method for shrinkers

From: Kent Overstreet
Date: Wed Nov 29 2023 - 23:15:01 EST


On Thu, Nov 30, 2023 at 11:42:45AM +0800, Qi Zheng wrote:
> > Similarly for shrinkers, we're not going to be printing all of them -
> > the patchset picks the top 10 by objects and prints those. Could
> > probably be ~4, there's fewer shrinkers than slabs; also if we can get
> > shrinkers to report on memory owned in bytes, that will help too with
> > deciding what information is pertinent.
>
> I'm not worried about the shrinker's general data. What I'm worried
> about is the shrinker's private data. Except for the corresponding
> developers, others don't know the meaning of the private statistical
> data, and we have no control over the printing quantity and form of
> the private data. This may indeed cause OOM log confusion and failure
> to automatically parse. For this, any thoughts?

If a shrinker is responsible for the OOM, then that private state is
exactly what is needed to debug the issue.

I explained this earlier - shrinkers can skip reclaiming objects for a
variety of reasons; in bcachefs's case that could be trylock() failure,
an IO in flight, the node being dirty, and more. Unlock the system inode
shrinker, it's much too expensive to move objects on and off the
shrinker list whenever they're touched. Again, this all comes from real
world experience.

The show_mem report is already full of numbers with zero explanation of
how they're relevant for debugging OOMS; we really need to improve how
that is presented as well.