Re: __get_free_pages()

Andrej Presern (andrejp@luz.fe.uni-lj.si)
Sat, 30 May 1998 14:02:23 +0200


Watching this thread, it seems to me that the current physical page
allocation mechanism could use a change or two. The physical page
allocator that I would like to describe has been designed for CAOS and
some parts will require an advanced vmm, but I believe that a couple of
ideas could be adopted for Linux as well. Since this is a preliminary
design, I'd like to get some feedback on it.

Anyway, here's the relevant part of the paper:

STORAGE
-------

1. Physical memory
------------------

Pages of physical memory are managed in a straightforward array of page
descriptors, where each page descriptor describes exactly one physical
page. To enable easy allocation and deallocation of pages, pointers to
the first and last entry in the array are kept.

+------+
| pd 1 | <--- first free page
+------+
| pd 2 |
+------+
| pd 3 |
+------+
| pd 4 |
+------+
| pd 5 | <--- last free page
+------+

Allocation and deallocation
---------------------------

To allocate a number of physical pages, the pointer to the first free
page (FFP) is increased to skip the specified number of page descriptors
that can be installed into a domain:

+------+
| pd 1 |
+------+
| pd 2 |
+------+
| pd 3 | <--- first free page after we allocated 2 pages
+------+
| pd 4 |
+------+
| pd 5 | <--- last free page
+------+

To return a number of allocated physical pages into the pool, the
pointer to the last free page (LFP) is increased to make room for the
specified number of pages and the skipped space can be populated with
page descriptors that aren't needed anymore. But this would mean that
the LFP pointer would go past the end of the page descriptor array,
which is solved by wrapping the pointer back to the beginning of the
array:

+------+
| pd 1 |
+------+
| pd 2 | <--- last free page after the 2 allocated pages have been freed
+------+
| pd 3 | <--- first free page
+------+
| pd 4 |
+------+
| pd 5 |
+------+

The same goes for FFP pointer: the pointer is wrapped to the beginning
when it falls beyond the end of array. The wrapper function itself is
quite simple:

if ( ffp > ptr_to_end_of_fd_array )
ffp -= sizeof( fd_array );

This way the array is effectively made into a sequential allocation
queue, where deallocated pages are inserted at the end and the allocated
ones are removed from the beginning of the queue, allowing for efficient
allocation and deallocation of physical pages.

The presented mechanism assumes that only one user will be accessing the
array and modifying the pointers at a time. In an environment where
multiple objects will be accessing the queue simultaneously, special
care must be taken so that the pointers are not being modified by more
than one object at a time and that the deallocator doesn't overwrite the
page descriptors that are still being copied into the page directory by
the allocator.

To circumvent this problem, an access semaphore is needed. To avoid use
of general purpose semaphores and the overhead they produce, a simple
hack can be used: Because a page descriptor takes up at least 4 bytes on
a 32-bit machine, the FFP and LFP pointers will be increased in
multiples of 4 byte steps. If we align the data on a 4 byte boundary and
look at the address increase in the binary form, we can see that the two
least significant bits of the address always remain zero:

Address Hex Bin
76 = 0x4C = %01001100
+4 = 80 = 0x50 = %01010000
^^ unaltered bits

The two unused bits can be used for locking individual pointers by using
a 'bit test and set', 'get and increase' or similar atomic operation.
When the pointer needs to be unlocked, a new value can simply be stored
into the variable, overwriting the lock bits and automatically unlocking
the variable.

To minimize the time when the pointers are locked the above mechanism
can be extended to include some info on copyings that are currently in
progress. The deallocator can look at this info to see if it is going to
overwrite page descriptors that the allocator is still copying into the
page directory even though it already bumped the FFP pointer up to
enable the next allocation to take place while it copies the data. After
the allocator has finished copying the data it clears the job so that
the deallocator can overwrite the area.

This info can be implemented in the form of a meter capability, where
the meter is the address, to which allocator or deallocator can advance,
because there are other uses of the array taking place at the same time.

Fragmentation
-------------

Fragmentation of physical memory is something that is usually very
difficult to prevent. For performance reasons the described mechanism
does not even attempt to prevent it.

However, if contiguous regions of physical memory are needed in the
system, the page descriptor array can be sorted, giving an ordered list
of physical pages which can then be searched for a contiguous set of
pages that fits the demand. The hole that is left in the array after the
allocation can be repaired by filling it with page descriptors from the
beginning or the end of the queue.

Cost
----

The cost in resources is primarily in the waste of space. The majority
is wasted to make room for page descriptors for individual pages. The
percentage of permanently wasted space can be calculated using the
following formula:

WS% = 100 * sizeof( page descriptor ) / sizeof( page )

For Intel 32-bit x86 architecture, the value for WS% is approximately
0.0977% of all space (1 KB of wasted space per 1 MB of physical memory).

The amount of wasted space is a constant for the duration of a power-on
cycle of the computer system. If physical memory can be added or removed
without rebooting the system, the array of page descriptors must be
resized.

Notes
-----

It is trivial to create different pools of memory, which can be layered
if such functionality is desired. This allows for pools such as DMA-able
parts of address space, which can be organized in a different manner
than those intended for normal paging. All of the pools can be connected
to a bigger allocator module that will allocate page descriptors from
different pools, according to flags that it receives as an argument.

The new memory capability can contain meter capabilities to different
memory pools, which allows for fine grained control over how much memory
of different types can an object hold.

Optimization
------------

If the allocator is installed in a separate address space, no checks
need to be done whether the pointers should be wrapped. The pointers can
be left to simply grow and will wrap automatically when the address is
high enough. As the pointers grow, new physical pages can be inserted at
the beginning and unused ones can be removed from the end of the queue,
gaining additional free physical pages as the number of free pages is
decreasing. If any of the entries in the page directory are already
populated when a new page should be inserted at the beginning, the
pointers can skip them and continue after the populated area, skipping
code and other data areas.

Using this optimization, the number of pages that are taken up by the
allocator's data can be reduced to the two pointers, FFP and LFP
respectively, which means virtually zero wasted pages.

[eop]

Andrej

-- 
Andrej Presern, andrejp@luz.fe.uni-lj.si

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu