Re: State of the Page (August 2022)

From: Kirill A. Shutemov
Date: Fri Aug 12 2022 - 10:31:07 EST


On Fri, Aug 12, 2022 at 02:34:53PM +0100, Matthew Wilcox wrote:
> On Fri, Aug 12, 2022 at 01:16:39PM +0300, Kirill A. Shutemov wrote:
> > On Thu, Aug 11, 2022 at 10:31:21PM +0100, Matthew Wilcox wrote:
> > > ==============================
> > > State Of The Page, August 2022
> > > ==============================
> > >
> > > I thought I'd write down where we are with struct page and where
> > > we're going, just to make sure we're all (still?) pulling in a similar
> > > direction.
> > >
> > > Destination
> > > ===========
> > >
> > > For some users, the size of struct page is simply too large. At 64
> > > bytes per 4KiB page, memmap occupies 1.6% of memory. If we can get
> > > struct page down to an 8 byte tagged pointer, it will be 0.2% of memory,
> > > which is an acceptable overhead.
> >
> > Right. This is attractive. But it brings cost of indirection.
>
> It does, but it also crams 8 pages into a single cacheline instead of
> occupying one cacheline per page.

If you really need info about these pages and reference their memdesc it
is likely be 9 cache lines that scattered across memory instead of 8 cache
lines next to each other in the same page.

And it's going to be two cachelines instead of one if we need info about
one page. I think it is the most common case.

Initially, I thought we can offset the cost by caching memdescs instead of
struct page/folio. Like page cache store memdesc, but it would require
memdesc_to_pfn() which is not possible, unless we want to store pfn
explicitly in memdesc.

I don't want to be buzzkill, I like the idea a lot, but abstractions are
often costly. Getting it upstream without noticeable performance
regressions going to be a challenge.

> > It can be especially painful for physical memory scanning. I guess we can
> > derive some info from memdesc type itself, like if it can be movable. But
> > still looks like an expensive change.
>
> I just don't think of physical memory scanning as something we do
> often, or in a performance-sensitive path. I'm OK with slowing down
> kcompactd if it makes walking the LRU list faster.
>
> > Do you have any estimation on how much CPU time we will pay to reduce
> > memory (and cache) overhead? RAM size tend to grow faster than IPC.
> > We need to make sure it is the right direction.
>
> I don't. I've heard colourful metaphors from the hyperscale crowd about
> how many more VMs they could sell, usually in terms of putting pallets
> of money in the parking lot and setting them on fire. But IPC isn't the
> right metric either, CPU performance is all about cache misses these days.

As I said above, I don't expect the new scheme to be cache-friendly
either.

--
Kiryl Shutsemau / Kirill A. Shutemov