Re: [PATCH 4/5] mm: page_alloc: Reduce cost of the fair zone allocation policy

From: Johannes Weiner
Date: Mon Jun 30 2014 - 10:41:51 EST


On Fri, Jun 27, 2014 at 08:25:37PM +0100, Mel Gorman wrote:
> On Fri, Jun 27, 2014 at 02:57:00PM -0400, Johannes Weiner wrote:
> > On Fri, Jun 27, 2014 at 09:14:39AM +0100, Mel Gorman wrote:
> > > And the number of pages allocated from each zone is comparable
> > >
> > > 3.16.0-rc2 3.16.0-rc2
> > > checklow fairzone
> > > DMA allocs 0 0
> > > DMA32 allocs 7374217 7920241
> > > Normal allocs 999277551 996568115
> >
> > Wow, the DMA32 zone gets less than 1% of the allocations. What are
> > the zone sizes in this machine?
> >
>
> managed 3976
> managed 755409
> managed 1281601

Something seems way off with this. On my system here, the DMA32 zone
makes up for 20% of managed pages and it gets roughly 20% of the page
allocations, as I would expect.

Your DMA32 zone makes up for 37% of the managed pages and receives
merely 0.7% of the page allocations. Unless a large portion of that
zone is somehow unreclaimable, fairness seems completely obliberated
in both kernels.

Is that checklow's doing?

> > > @@ -3287,10 +3287,18 @@ void show_free_areas(unsigned int filter)
> > > show_swap_cache_info();
> > > }
> > >
> > > -static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
> > > +static int zoneref_set_zone(pg_data_t *pgdat, struct zone *zone,
> > > + struct zoneref *zoneref, struct zone *preferred_zone)
> > > {
> > > + int zone_type = zone_idx(zone);
> > > + bool fair_enabled = zone_local(zone, preferred_zone);
> > > + if (zone_type == 0 &&
> > > + zone->managed_pages < (pgdat->node_present_pages >> 4))
> > > + fair_enabled = false;
> >
> > This needs a comment.
> >
>
> /*
> * Do not count the lowest zone as of relevance to the fair zone
> * allocation policy if it's a small percentage of the node
> */
>
> However, as I write this I'll look at getting rid of this entirely. It
> made some sense when fair_eligible was tracked on a per-zone basis but
> it's more complex than necessary.
>
> > > zoneref->zone = zone;
> > > - zoneref->zone_idx = zone_idx(zone);
> > > + zoneref->zone_idx = zone_type;
> > > + return fair_enabled;
> > > }
> > >
> > > /*
> > > @@ -3303,17 +3311,26 @@ static int build_zonelists_node(pg_data_t *pgdat, struct zonelist *zonelist,
> > > {
> > > struct zone *zone;
> > > enum zone_type zone_type = MAX_NR_ZONES;
> > > + struct zone *preferred_zone = NULL;
> > > + int nr_fair = 0;
> > >
> > > do {
> > > zone_type--;
> > > zone = pgdat->node_zones + zone_type;
> > > if (populated_zone(zone)) {
> > > - zoneref_set_zone(zone,
> > > - &zonelist->_zonerefs[nr_zones++]);
> > > + if (!preferred_zone)
> > > + preferred_zone = zone;
> > > +
> > > + nr_fair += zoneref_set_zone(pgdat, zone,
> > > + &zonelist->_zonerefs[nr_zones++],
> > > + preferred_zone);
> >
> > Passing preferred_zone to determine locality seems pointless when you
> > walk the zones of a single node.
> >
>
> True.
>
> > And the return value of zoneref_set_zone() is fairly unexpected.
> >
>
> How so?

Given the name zoneref_set_zone(), I wouldn't expect any return value,
or a success/failure type return value at best - certainly not whether
the passed zone is eligible for the fairness policy.

> > It's probably better to determine fair_enabled in the callsite, that
> > would fix both problems, and write a separate helper that tests if a
> > zone is eligible for fair treatment (type && managed_pages test).
> >
>
> Are you thinking of putting that into the page allocator fast path? I'm
> trying to take stuff out of there :/.

Not at all, I was just suggesting to restructure the code for building
the zonelists, and move the fairness stuff out of zoneref_set_zone().

If you remove the small-zone exclusion as per above, this only leaves
the locality check when building the zonelist in zone order and that
can easily be checked inline in build_zonelists_in_zone_order().

build_zonelists_node() can just count every populated zone in nr_fair.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/