Re: [PATCH 0/3] Reduce impact to overall system of SLUB usinghigh-order allocations

From: Mel Gorman
Date: Thu May 12 2011 - 09:20:26 EST


On Thu, May 12, 2011 at 02:13:44PM +0300, Pekka Enberg wrote:
> On 5/12/11 1:34 AM, James Bottomley wrote:
> >On Wed, 2011-05-11 at 15:28 -0700, David Rientjes wrote:
> >>On Wed, 11 May 2011, James Bottomley wrote:
> >>
> >>>OK, I confirm that I can't seem to break this one. No hangs visible,
> >>>even when loading up the system with firefox, evolution, the usual
> >>>massive untar, X and even a distribution upgrade.
> >>>
> >>>You can add my tested-by
> >>>
> >>Your system still hangs with patches 1 and 2 only?
> >Yes, but only once in all the testing. With patches 1 and 2 the hang is
> >much harder to reproduce, but it still seems to be present if I hit it
> >hard enough.
>
> Patches 1-2 look reasonable to me. I'm not completely convinced of
> patch 3, though. Why are we seeing these problems now?

I'm not certain and testing so far as only being able to point to changing
from SLAB to SLUB between 2.6.37 and 2.6.38. This probably boils down to
distributions changing their allocator from slab to slub as recommended by
Kconfig and SLUB being tested heavily on desktop workloads in a variety of
settings for the first time. It's worth noting that only a few users have
been able to reproduce this. I don't see the severe hangs for example during
tests meaning it might also be down to newer hardware. What may be required
to reproduce this is many CPUs (4 on the test machines) with relatively
low memory for a 4-CPU machine (2G) and a slower disk than people might
have tested with up until now.

There are other new considerations as well that weren't much of a factor
when SLUB came along. The first reproduction case showed involved ext4 for
example which does delayed block allocation. It's possible there is some
problem wherby all the dirty pages to be written to disk need blocks to
be allocated and GFP_NOFS is not being used properly. Instead of failing
the high-order allocation, we then block instead hanging direct reclaimers
and kswapd. The filesystem people looked at this bug but didn't mention if
something like this was a possibility.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/