Re: Bug in __alloc_pages()?

From: Matthew Dobson
Date: Fri Mar 18 2005 - 17:30:27 EST


Nick Piggin wrote:
Matthew Dobson wrote:

While looking at some bugs related to OOM handling in 2.6, Martin Bligh and I noticed some order 0 page allocation failures from kswapd:

>> <snip>

If, while the system is under memory pressure, something attempts to allocate a page from interrupt context while current == kswapd we will obviously fail the !in_interrupt() check and fall through. If this allocation request was made with __GFP_WAIT set then we'll fall through the next !wait check. We will then set the PF_MEMALLOC flag and set p->reclaim_state to point to __alloc_pages() local reclaim_state structure. kswapd alread has it's own reclaim_state and already has PF_MEMALLOC set, which would then be lost when, after try_to_free_pages(), we unconditionally set the reclaim_state to NULL and turn off the PF_MEMALLOC flag.

I'm not 100% sure that this potential bug is even possible (ie: can we have an in_interrupt() page request that has __GFP_WAIT set?), or is the cause of the 0-order page allocation failures we see, but it does seem like potentially dangerous code. I have attatched a patch (against 2.6.11-mm4) to check whether the current task has it's own reclaim_state or already has PF_MEMALLOC set and if so, no longer throws away this data.


I don't think in_interrupt allocations can have __GFP_WAIT set, so
this should probably be OK.

Nick

Agreed. It seems unlikely, but not entirely impossible. All it would take is one sloppily coded driver, right? How about this patch instead?

-Matt diff -Nurp --exclude-from=/home/mcd/.dontdiff linux-2.6.11-mm4/mm/page_alloc.c linux-2.6.11-mm4+fix-__alloc_pages/mm/page_alloc.c
--- linux-2.6.11-mm4/mm/page_alloc.c 2005-03-16 16:07:49.000000000 -0800
+++ linux-2.6.11-mm4+fix-__alloc_pages/mm/page_alloc.c 2005-03-18 14:10:27.433667720 -0800
@@ -957,8 +957,10 @@ rebalance:
cond_resched();

/* We now go into synchronous reclaim */
+ BUG_ON(p->flags & PF_MEMALLOC);
p->flags |= PF_MEMALLOC;
reclaim_state.reclaimed_slab = 0;
+ BUG_ON(p->reclaim_state);
p->reclaim_state = &reclaim_state;

did_some_progress = try_to_free_pages(zones, gfp_mask, order);