Re: [PATCH v2 1/1] mm/madvise: add MADV_F_COLLAPSE_LIGHT to process_madvise()

From: Lance Yang
Date: Thu Jan 18 2024 - 20:46:55 EST


On Thu, Jan 18, 2024 at 10:59 PM Zach O'Keefe <zokeefe@xxxxxxxxxx> wrote:
>
> On Thu, Jan 18, 2024 at 5:43 AM Michal Hocko <mhocko@xxxxxxxx> wrote:
> >
> > Dang, forgot to cc linux-api...
> >
> > On Thu 18-01-24 14:40:19, Michal Hocko wrote:
> > > On Thu 18-01-24 20:03:46, Lance Yang wrote:
> > > [...]
> > >
> > > before we discuss the semantic, let's focus on the usecase.
> > >
> > > > Use Cases
> > > >
> > > > An immediate user of this new functionality is the Go runtime heap allocator
> > > > that manages memory in hugepage-sized chunks. In the past, whether it was a
> > > > newly allocated chunk through mmap() or a reused chunk released by
> > > > madvise(MADV_DONTNEED), the allocator attempted to eagerly back memory with
> > > > huge pages using madvise(MADV_HUGEPAGE)[2] and madvise(MADV_COLLAPSE)[3]
> > > > respectively. However, both approaches resulted in performance issues; for
> > > > both scenarios, there could be entries into direct reclaim and/or compaction,
> > > > leading to unpredictable stalls[4]. Now, the allocator can confidently use
> > > > process_madvise(MADV_F_COLLAPSE_LIGHT) to attempt the allocation of huge pages.
>
> Aside: The thought was a MADV_F_COLLAPSE_LIGHT _flag_; so it'd be
> process_madvise(..., MADV_COLLAPSE, MADV_F_COLLAPSE_LIGHT)

I apologize for the misunderstanding. I will provide the correct implementation
in version 3.

BR,
Lance

>
> > > IIUC the primary reason is the cost of the huge page allocation which
> > > can be really high if the memory is heavily fragmented and it is called
> > > synchronously from the process directly, correct? Can that be worked
> > > around by process_madvise and performing the operation from a different
> > > context? Are there any other reasons to have a different mode?
> > >
> > > I mean I can think of a more relaxed (opportunistic) MADV_COLLAPSE -
> > > e.g. non blocking one to make sure that the caller doesn't really block
> > > on resource contention (be it locks or memory availability) because that
> > > matches our non-blocking interface in other areas but having a LIGHT
> > > operation sounds really vague and the exact semantic would be
> > > implementation specific and might change over time. Non-blocking has a
> > > clear semantic but it is not really clear whether that is what you
> > > really need/want.
>
> IIUC, usecase from Go is unbounded latency due to sync compaction in a
> context where the latency is unacceptable. Working w/ them to
> understand how things can be improved -- it's possible the changes can
> occur entirely on their side, w/o any additional kernel support.
>
> The non-blocking case awkwardly sits between MADV_COLLAPSE today, and
> khugepaged; esp when common case is that the allocation can probably
> be satisfied in fast path.
>
> The suggestion for something like "LIGHT" was intentionally vague
> because it could allow for other optimizations / changes down the
> line, as you point out. I think that might be a win, vs tying to a
> specific optimization (e.g. like a MADV_F_COLLAPSE_NODEFRAG). But I
> could be alone on that front, given the design of
> /sys/kernel/mm/transparent_hugepage.
>
> But circling back, I agree w/ you that the first order of business is to
> iron out a real usecase. As of right now, it's not clear something
> like this is required or helpful.
>
> Thanks,
> Zach
>
>
>
>
> > > > [1] https://github.com/torvalds/linux/commit/7d8faaf155454f8798ec56404faca29a82689c77
> > > > [2] https://github.com/golang/go/commit/8fa9e3beee8b0e6baa7333740996181268b60a3a
> > > > [3] https://github.com/golang/go/commit/9f9bb26880388c5bead158e9eca3be4b3a9bd2af
> > > > [4] https://github.com/golang/go/issues/63334
> > > >
> > > > [v1] https://lore.kernel.org/lkml/20240117050217.43610-1-ioworker0@xxxxxxxxx/
> > > --
> > > Michal Hocko
> > > SUSE Labs
> >
> > --
> > Michal Hocko
> > SUSE Labs