Re: [PATCH v2 1/1] mm/madvise: add MADV_F_COLLAPSE_LIGHT to process_madvise()

From: Lance Yang
Date: Fri Jan 19 2024 - 21:09:57 EST


On Fri, Jan 19, 2024 at 8:51 PM Michal Hocko <mhocko@xxxxxxxx> wrote:
>
> On Fri 19-01-24 10:03:05, Lance Yang wrote:
> > Hey Michal,
> >
> > Thanks for taking the time to review!
> >
> > On Thu, Jan 18, 2024 at 9:40 PM Michal Hocko <mhocko@xxxxxxxx> wrote:
> > >
> > > On Thu 18-01-24 20:03:46, Lance Yang wrote:
> > > [...]
> > >
> > > before we discuss the semantic, let's focus on the usecase.
> > >
> > > > Use Cases
> > > >
> > > > An immediate user of this new functionality is the Go runtime heap allocator
> > > > that manages memory in hugepage-sized chunks. In the past, whether it was a
> > > > newly allocated chunk through mmap() or a reused chunk released by
> > > > madvise(MADV_DONTNEED), the allocator attempted to eagerly back memory with
> > > > huge pages using madvise(MADV_HUGEPAGE)[2] and madvise(MADV_COLLAPSE)[3]
> > > > respectively. However, both approaches resulted in performance issues; for
> > > > both scenarios, there could be entries into direct reclaim and/or compaction,
> > > > leading to unpredictable stalls[4]. Now, the allocator can confidently use
> > > > process_madvise(MADV_F_COLLAPSE_LIGHT) to attempt the allocation of huge pages.
> > >
> > > IIUC the primary reason is the cost of the huge page allocation which
> > > can be really high if the memory is heavily fragmented and it is called
> > > synchronously from the process directly, correct? Can that be worked
> >
> > Yes, that's correct.
> >
> > > around by process_madvise and performing the operation from a different
> > > context? Are there any other reasons to have a different mode?
> >
> > In latency-sensitive scenarios, some applications aim to enhance performance
> > by utilizing huge pages as much as possible. At the same time, in case of
> > allocation failure, they prefer a quick return without triggering direct memory
> > reclamation and compaction.
>
> Could you elaborate some more on why?
>
> > > I mean I can think of a more relaxed (opportunistic) MADV_COLLAPSE -
> > > e.g. non blocking one to make sure that the caller doesn't really block
> > > on resource contention (be it locks or memory availability) because that
> > > matches our non-blocking interface in other areas but having a LIGHT
> > > operation sounds really vague and the exact semantic would be
> > > implementation specific and might change over time. Non-blocking has a
> > > clear semantic but it is not really clear whether that is what you
> > > really need/want.
> >
> > Could you provide me with some suggestions regarding the naming of a
> > more relaxed (opportunistic) MADV_COLLAPSE?
>
> Naming is not all that important at this stage (it could be
> MADV_COLLAPSE_NOBLOCK for example). The primary question is whether
> non-blocking in general is the desired behavior or the implementation
> should try but not too hard.

Hey Michal,

Thanks for your suggestion!

It seems that the implementation should try but not too hard aligns well
with my desired behavior. Non-blocking in general is also a great idea.
Perhaps in the future, we can add a MADV_F_COLLAPSE_NOBLOCK
flag for scenarios where latency is extremely critical.

Thanks again,
Lance
>
> --
> Michal Hocko
> SUSE Labs