Re: [patch 10/10] mm: keep page cache radix tree nodes in check

From: Andrew Morton
Date: Tue Feb 04 2014 - 18:08:18 EST


On Mon, 3 Feb 2014 19:53:42 -0500 Johannes Weiner <hannes@xxxxxxxxxxx> wrote:

> Previously, page cache radix tree nodes were freed after reclaim
> emptied out their page pointers. But now reclaim stores shadow
> entries in their place, which are only reclaimed when the inodes
> themselves are reclaimed. This is problematic for bigger files that
> are still in use after they have a significant amount of their cache
> reclaimed, without any of those pages actually refaulting. The shadow
> entries will just sit there and waste memory. In the worst case, the
> shadow entries will accumulate until the machine runs out of memory.
>
> To get this under control, the VM will track radix tree nodes
> exclusively containing shadow entries on a per-NUMA node list.
> Per-NUMA rather than global because we expect the radix tree nodes
> themselves to be allocated node-locally and we want to reduce
> cross-node references of otherwise independent cache workloads. A
> simple shrinker will then reclaim these nodes on memory pressure.
>
> A few things need to be stored in the radix tree node to implement the
> shadow node LRU and allow tree deletions coming from the list:
>
> 1. There is no index available that would describe the reverse path
> from the node up to the tree root, which is needed to perform a
> deletion. To solve this, encode in each node its offset inside the
> parent. This can be stored in the unused upper bits of the same
> member that stores the node's height at no extra space cost.
>
> 2. The number of shadow entries needs to be counted in addition to the
> regular entries, to quickly detect when the node is ready to go to
> the shadow node LRU list. The current entry count is an unsigned
> int but the maximum number of entries is 64, so a shadow counter
> can easily be stored in the unused upper bits.
>
> 3. Tree modification needs tree lock and tree root, which are located
> in the address space, so store an address_space backpointer in the
> node. The parent pointer of the node is in a union with the 2-word
> rcu_head, so the backpointer comes at no extra cost as well.
>
> 4. The node needs to be linked to an LRU list, which requires a list
> head inside the node. This does increase the size of the node, but
> it does not change the number of objects that fit into a slab page.

changelog forgot to mention that this reclaim is performed via a
shrinker...

How expensive is that list walk in scan_shadow_nodes()? I assume in
the best case it will bale out after nr_to_scan iterations?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/