Overstep the kernel 512GB mmap limit ?

From: Xavier Roche
Date: Thu Apr 29 2004 - 04:28:14 EST


Hi folks,

There is currently a limit per process in the kernel vm that prevent from mmapp'ing more
than 512GB of data. This 512G limit - as far as understood - also includes all the code +
data, the heap (all growing up), and the stack (growing down). There is a possibility to
tune the barrier between mmap and stack, but ther's always the "512G" limit anyway.
This matter was previously raised by Andrea Arcangeli and Andi Kleen, and it was at this
epoch not a critical issue - that may be solved later, maybe in the upcoming 2.7.

Here's an ugly ascii map (time to switch to fixed font :) ):

--------------------------------+ 512G (the "512 barrier")
| stack || |
| || |
| \/ |
| |
| /\ |
| || |
| mmap's || |
+-------------------------------+ TASK_UNMAPPED_BASE (1/3 of 512G?)
| /\ |
| || |
| code/data/heap || |
+ ------------------------------+ 1GB
| not mapped (?) |
+-------------------------------+ 0

Now that 64-bit processing tend to be widely used thanks to cheap processors, mapping areas
overstepping the classical 32-bit space is common. And if having 512G of ram is not really
used, mouting few TB of data is now something common (512GB is only a matter of two cheap
IDE disks mounted in raid).
The problem is that when working with huge filesystems/files mounted in mmap, or huge
databases, you are limited by this barrier, even with 64-bit archs.
We (Exalead) reached this barrier several times on Linux, when dealing with big "userspace
filesystem" contents.

According to Andi Kleen, the limit is related to the generic vm kernel code which only
supports 3 levels of page tables.

Would it be possible to consider (Andrew / Linus ?) the inclusion of a "process can mmap more 512GB of data"
option [as more than 3 tables can potentially decrease the performances in the vm
bottleneck] in the kernel ? Andi Kleen told me that he was ok to help in this direction, but
maintaining such a major / critical patch outside the kernel is not an easy thing to do.


--
Regards,
Xavier

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/