Re: [PATCH] mm, memory_hotplug: do not back off draining pcp free pages from kworker context

From: Michal Hocko
Date: Tue Aug 29 2017 - 09:53:46 EST


On Mon 28-08-17 15:33:59, Andrew Morton wrote:
> On Mon, 28 Aug 2017 11:33:41 +0200 Michal Hocko <mhocko@xxxxxxxxxx> wrote:
>
> > drain_all_pages backs off when called from a kworker context since
> > 0ccce3b924212 ("mm, page_alloc: drain per-cpu pages from workqueue
> > context") because the original IPI based pcp draining has been replaced
> > by a WQ based one and the check wanted to prevent from recursion and
> > inter workers dependencies. This has made some sense at the time
> > because the system WQ has been used and one worker holding the lock
> > could be blocked while waiting for new workers to emerge which can be a
> > problem under OOM conditions.
> >
> > Since then ce612879ddc7 ("mm: move pcp and lru-pcp draining into single
> > wq") has moved draining to a dedicated (mm_percpu_wq) WQ with a rescuer
> > so we shouldn't depend on any other WQ activity to make a forward
> > progress so calling drain_all_pages from a worker context is safe as
> > long as this doesn't happen from mm_percpu_wq itself which is not the
> > case because all workers are required to _not_ depend on any MM locks.
> >
> > Why is this a problem in the first place? ACPI driven memory hot-remove
> > (acpi_device_hotplug) is executed from the worker context. We end
> > up calling __offline_pages to free all the pages and that requires
> > both lru_add_drain_all_cpuslocked and drain_all_pages to do their job
> > otherwise we can have dangling pages on pcp lists and fail the offline
> > operation (__test_page_isolated_in_pageblock would see a page with 0
> > ref. count but without PageBuddy set).
> >
> > Fix the issue by removing the worker check in drain_all_pages.
> > lru_add_drain_all_cpuslocked doesn't have this restriction so it works
> > as expected.
> >
> > Fixes: 0ccce3b924212 ("mm, page_alloc: drain per-cpu pages from workqueue context")
> > Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
>
> No cc:stable?

I wouldn't be opposed I have just seen so many things broken in this
area I didn't consider this important enough. This would be 4.11+

Thanks!
--
Michal Hocko
SUSE Labs