Re: [PATCH 0/5] Candidate fix for increased number of GFP_ATOMICfailures V2

From: Mel Gorman
Date: Thu Oct 22 2009 - 12:03:16 EST


On Thu, Oct 22, 2009 at 05:47:10PM +0300, Pekka Enberg wrote:
> On Thu, Oct 22, 2009 at 5:22 PM, Mel Gorman <mel@xxxxxxxxx> wrote:
> > Test 1: Verify your problem occurs on 2.6.32-rc5 if you can
> >
> > Test 2: Apply the following two patches and test again
> >
> >  1/5 page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed
> >  2/5 page allocator: Do not allow interrupts to use ALLOC_HARDER
>
> These are pretty obvious bug fixes and should go to linux-next ASAP IMHO.
>

Agreed, but I wanted to pin down where exactly we stand with this
problem before sending patches any direction for merging.

> > Test 5: If things are still screwed, apply the following
> >  5/5 Revert 373c0a7e, 8aa7e847: Fix congestion_wait() sync/async vs read/write confusion
> >
> >        Frans Pop reports that the bulk of his problems go away when this
> >        patch is reverted on 2.6.31. There has been some confusion on why
> >        exactly this patch was wrong but apparently the conversion was not
> >        complete and further work was required. It's unknown if all the
> >        necessary work exists in 2.6.31-rc5 or not. If there are still
> >        allocation failures and applying this patch fixes the problem,
> >        there are still snags that need to be ironed out.
>
> As explained by Jens Axboe, this changes timing but is not the source
> of the OOMs so the revert is bogus even if it "helps" on some
> workloads. IIRC the person who reported the revert to help things did
> report that the OOMs did not go away, they were simply harder to
> trigger with the revert.
>

IIRC, there were mixed reports as to how much the revert helped. I'm hoping
that patches 1+2 cover the bases hence why I asked them to be tested on
their own. Patch 2 in particular might be responsible for watermarks being
impacted enough to cause timing problems. I left reverting with patch 5 as
a standalone test to see how much of a factor the timing changes introduced
are if there are still allocation problems.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/