Re: [RFC PATCH] mm/readahead: readahead aggressively if read drops in willneed range

From: David Hildenbrand
Date: Tue Jan 30 2024 - 05:43:48 EST


On 29.01.24 23:46, Mike Snitzer wrote:
On Mon, Jan 29 2024 at 5:12P -0500,
Dave Chinner <david@xxxxxxxxxxxxx> wrote:

On Mon, Jan 29, 2024 at 12:19:02PM -0500, Mike Snitzer wrote:
While I'm sure this legacy application would love to not have to
change its code at all, I think we can all agree that we need to just
focus on how best to advise applications that have mixed workloads
accomplish efficient mmap+read of both sequential and random.

To that end, I heard Dave clearly suggest 2 things:

1) update MADV/FADV_SEQUENTIAL to set file->f_ra.ra_pages to
bdi->io_pages, not bdi->ra_pages * 2

2) Have the application first issue MADV_SEQUENTIAL to convey that for
the following MADV_WILLNEED is for sequential file load (so it is
desirable to use larger ra_pages)

This overrides the default of bdi->ra_pages and _should_ provide the
required per-file duality of control for readahead, correct?

I just discovered MADV_POPULATE_READ - see my reply to Ming
up-thread about that. The applicaiton should use that instead of
MADV_WILLNEED because it gives cache population guarantees that
WILLNEED doesn't. Then we can look at optimising the performance of
MADV_POPULATE_READ (if needed) as there is constrained scope we can
optimise within in ways that we cannot do with WILLNEED.

Nice find! Given commit 4ca9b3859dac ("mm/madvise: introduce
MADV_POPULATE_(READ|WRITE) to prefault page tables"), I've cc'd David
Hildenbrand just so he's in the loop.

Thanks for CCing me.

MADV_POPULATE_READ is indeed different; it doesn't give hints (not "might be a good idea to read some pages" like MADV_WILLNEED documents), it forces swapin/read/.../.

In a sense, MADV_POPULATE_READ is similar to simply reading one byte from each PTE, triggering page faults. However, without actually reading from the target pages.

MADV_POPULATE_READ has a conceptual benefit: we know exactly how much memory user space wants to have populated (which range). In contrast, page faults contain no such hints and we have to guess based on historical behavior. One could use that range information to *not* do any faultaround/readahead when we come via MADV_POPULATE_READ, and really only popoulate the range of interest.

Further, one can use that range information to allocate larger folios, without having to guess where placement of a large folio is reasonable, and which size we should use.


FYI, I proactively raised feedback and questions to the reporter of
this issue:
CONTEXT: madvise(WILLNEED) doesn't convey the nature of the access,
sequential vs random, just the range that may be accessed.

Indeed. The "problem" with MADV_SEQUENTIAL/MADV_RANDOM is that it will fragment/split VMAs. So applying it to smaller chunks (like one would do with MADV_WILLNEED) is likely not a good option.

--
Cheers,

David / dhildenb