Re: 回复: [PATCH] mm/compaction: add check mechanism to avoid cma alloc fail

From: Matthew Wilcox
Date: Sat Jan 27 2024 - 11:27:20 EST


On Sat, Jan 27, 2024 at 11:03:48AM +0000, Haiqiang Gong (龚海强) wrote:
> On Wed, 2024-01-24 at 18:40 +0000, Matthew Wilcox wrote:
> > On Wed, Jan 24, 2024 at 07:20:53AM +0000, Haiqiang Gong (龚海强) wrote:
> > > > I don't understand. You say that the memory isn't movable, but
> > then you
> > > > say that it's migrated in. So it was movable, but it's no longer
> > > > movable after being moved once?
> > > Sorry for not expressing clearly
> > > When doing memory migration, the kernel will determine whether the
> > current
> > > page can be moved based on the refcount and mapcount of the current
> > page.
> > > This memory can be moved during kernel compaction. At this time,
> > refcount
> > > is less than or equal to mapcount.
> > > After this memory is kcompacted and placed in the cma buffer,
> > under
> > > certain special conditions, the refcount may be greater than the
> > mapcount
> > > (ex:the current page is being used by fs), and then migrate will
> > fail.
> >
> > But that's always true. Any page that is currently in use might have
> > its refcount temporarily incremented. There's nothing special about
> > pages that belong to a file. You've basically just prevented all
> > filesystem memory from being migrated to the CMA area, and that's
> > wrong.
> >
> Yes, we agree with you that refcount may temporarily incremented.
> Issues we have reproduced:
> The current page is migrated to the cma area by kcompactd, rather than
> allocated by kernel memory allocater.
> Our opinion is that if a page cannot be allocated to the CMA area, then
> we should not put it in the CMA area when doing kernel migration. This
> seems more reasonable. Do you agree with this view or do you have any
> other suggestions?

That does seem reasonable. But I don't know if it helps you at all;
is there a type of allocation which is migratable but not allocatable
from the CMA area?

> > What's special about this page? Or were you just unlucky?
> We didn't find anything special about this page. During our debugging,
> we found that once a similar problem occurs in the current page, it can
> no longer be migrated (retrying after an hour will not work).

Perhaps you can find out more information about this particular page; who
allocated it, why was it migratable initially but not the second time?
Perhaps something happens to this page to keep the refcount high, and
if we can find out that will happen, we can migrate it out of the CMA
area before incrementing the refcount.