Re: [PATCH v3] mm: fix tick timer stall during deferred page init

From: Daniel Jordan
Date: Thu Mar 19 2020 - 15:05:10 EST


On Wed, Mar 11, 2020 at 08:38:48PM +0800, Shile Zhang wrote:

Sorry, I'm late to this.

I don't have a better solution, but I did try to find a way to stop holding the
resize lock during (most of) page init, which would make this fix unnecessary
and the deferred_init_memmap context less strange. Here are some ideas that
didn't work out in case someone sees a different way forward.

One thought is to unify the common parts of deferred_init_memmap and
deferred_grow_zone and have callers grab chunks of pages to initialize and note
the next available page to initialize for the next caller. Interrupt handlers
participate in page init while it's happening rather than having to wait until
it's finished. But what if a partially completed chunk is interrupted midway
through and the interrupt handler needs to allocate those in-progress pages?
May be possible to guarantee some memory is available if some minimum number of
chunks have been completed already, but it's hard to say what that number is if
the amount of memory handlers might allocate is unbounded.

Given that large allocations from interrupt handlers is a theoretical issue,
another thought is to reserve one section for deferred_grow_zone, should it be
called during page init, and if not then the pgdatinit thread could initialize
it with the resize lock held after the rest of page init is finished.
Meanwhile regular page init need not hold the resize lock. If interrupt
handlers try to allocate more than a section during this time, trigger a
warning so we know the issue isn't theoretical. The downside is that it's
possible this may not fix it for good.

> @@ -1811,9 +1816,23 @@ static int __init deferred_init_memmap(v
> * that we can avoid introducing any issues with the buddy
> * allocator.
> */
> - while (spfn < epfn)
> + while (spfn < epfn) {
> nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn);
> + /*
> + * Release the interrupts for every TICK_PAGE_COUNT pages
> + * (128MB) to give tick timer the chance to update the
> + * system jiffies.
> + */
> + if ((nr_pages - prev_nr_pages) > TICK_PAGE_COUNT) {
> + prev_nr_pages = nr_pages;
> + pgdat->first_deferred_pfn = spfn;
> + pgdat_resize_unlock(pgdat, &flags);
> + goto again;
> + }
> + }
> +

Nits only:

- s/Release the interrupts/Enable interrupts/
- take out 128MB, that assumes PAGE_SIZE is 4k

I considered saving i, spfn, and epfn in pgdat to avoid having to rerun
deferred_init_mem_pfn_range_in_zone every retry, but it'd enlarge pgdat for
short-lived data and the function probably isn't expensive.

Regardless,
Reviewed-by: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx>