Re: [RFC][Patch v11 1/2] mm: page_hinting: core infrastructure

From: Dave Hansen
Date: Thu Jul 11 2019 - 14:21:52 EST


On 7/10/19 12:51 PM, Nitesh Narayan Lal wrote:
> +static void bm_set_pfn(struct page *page)
> +{
> + struct zone *zone = page_zone(page);
> + int zone_idx = page_zonenum(page);
> + unsigned long bitnr = 0;
> +
> + lockdep_assert_held(&zone->lock);
> + bitnr = pfn_to_bit(page, zone_idx);
> + /*
> + * TODO: fix possible underflows.
> + */
> + if (free_area[zone_idx].bitmap &&
> + bitnr < free_area[zone_idx].nbits &&
> + !test_and_set_bit(bitnr, free_area[zone_idx].bitmap))
> + atomic_inc(&free_area[zone_idx].free_pages);
> +}

Let's say I have two NUMA nodes, each with ZONE_NORMAL and ZONE_MOVABLE
and each zone with 1GB of memory:

Node: 0 1
NORMAL 0->1GB 2->3GB
MOVABLE 1->2GB 3->4GB

This code will allocate two bitmaps. The ZONE_NORMAL bitmap will
represent data from 0->3GB and the ZONE_MOVABLE bitmap will represent
data from 1->4GB. That's the result of this code:

> + if (free_area[zone_idx].base_pfn) {
> + free_area[zone_idx].base_pfn =
> + min(free_area[zone_idx].base_pfn,
> + zone->zone_start_pfn);
> + free_area[zone_idx].end_pfn =
> + max(free_area[zone_idx].end_pfn,
> + zone->zone_start_pfn +
> + zone->spanned_pages);

But that means that both bitmaps will have space for PFNs in the other
zone type, which is completely bogus. This is fundamental because the
data structures are incorrectly built per zone *type* instead of per zone.