Re: 回复: [PATCH] mm/compaction: add check mechanism to avoid cma alloc fail

From: Haiqiang Gong (龚海强)
Date: Sat Jan 27 2024 - 06:04:16 EST


On Wed, 2024-01-24 at 18:40 +0000, Matthew Wilcox wrote:
>
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
> On Wed, Jan 24, 2024 at 07:20:53AM +0000, Haiqiang Gong (龚海强) wrote:
> > > I don't understand. You say that the memory isn't movable, but
> then you
> > > say that it's migrated in. So it was movable, but it's no longer
> > > movable after being moved once?
> > Sorry for not expressing clearly
> > When doing memory migration, the kernel will determine whether the
> current
> > page can be moved based on the refcount and mapcount of the current
> page.
> > This memory can be moved during kernel compaction. At this time,
> refcount
> > is less than or equal to mapcount.
> > After this memory is kcompacted and placed in the cma buffer,
> under
> > certain special conditions, the refcount may be greater than the
> mapcount
> > (ex:the current page is being used by fs), and then migrate will
> fail.
>
> But that's always true. Any page that is currently in use might have
> its refcount temporarily incremented. There's nothing special about
> pages that belong to a file. You've basically just prevented all
> filesystem memory from being migrated to the CMA area, and that's
> wrong.
>
Yes, we agree with you that refcount may temporarily incremented.
Issues we have reproduced:
The current page is migrated to the cma area by kcompactd, rather than
allocated by kernel memory allocater.
Our opinion is that if a page cannot be allocated to the CMA area, then
we should not put it in the CMA area when doing kernel migration. This
seems more reasonable. Do you agree with this view or do you have any
other suggestions?


> What's special about this page? Or were you just unlucky?
We didn't find anything special about this page. During our debugging,
we found that once a similar problem occurs in the current page, it can
no longer be migrated (retrying after an hour will not work).