32bit architectures and __HAVE_ARCH_PTE_SWP_EXCLUSIVE

From: David Hildenbrand
Date: Tue Nov 22 2022 - 09:07:47 EST


Hi all,

Spoiler: is there a real use case for > 16 GiB of swap in a single file on 32bit architectures?


I'm currently looking into implementing __HAVE_ARCH_PTE_SWP_EXCLUSIVE support for all remaining architectures. So far, I only implemented it for the most relevant enterprise architectures.


With __HAVE_ARCH_PTE_SWP_EXCLUSIVE, we remember when unmapping a page and replacing the present PTE by a swap PTE for swapout whether the anonymous page that was mapped was exclusive (PageAnonExclusive(), i.e., not COW-shared). When refaulting that page, whereby we replace the swap PTE by a present PTE, we can reuse that information to map that page writable and avoid unnecessary page copies due to COW, even if there are still unexpected references on the page.

While this would usually be a pure optimization, currently O_DIRECT still (wrongly) uses FOLL_GET instead of FOLL_PIN and can trigger in corner cases memory corruptions. So for that case, it is also a temporary fix until O_DIRECT properly uses FOLL_PIN. More details can be found in [1].


Ideally, I'd just implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE for all architectures. However, __HAVE_ARCH_PTE_SWP_EXCLUSIVE requires an additional bit in the swap PTE. While mostly unproblematic on 64bit, for 32bit this implies that we'll have to "steal" one bit from the swap offset on most architectures, reducing the maximum swap size per file.


Assuming we previously supported 32 GiB per swap file (e.g., hexagon, csky), this number would get reduced to 16 GiB. The kernel would automatically truncate the oversized swap area and the system would continue working by using less space of that swapfile, but ... well, is there a but?

Usually (well, there is PAE on x86 ...), a 32bit system can address 4 GiB of memory. Maximum swap size recommendation seem to be around 2--3 times the memory size (2x without hibernation, 3x with hibernation). So it sounds like there is barely a use case for more swap space. Of course one can use multiple swap files.


So, is anybody aware of excessive swap space requirements on 32bit?


Note that I thought about storing the exclusive marker in the swap_map instead of in the swap PTE, but quickly decided to discard that idea because it results in significantly more complexity and the swap code is already horrible enough.

[1] https://lkml.kernel.org/r/20220329164329.208407-1-david@xxxxxxxxxx

--
Thanks,

David / dhildenb