Re: Swapping algorithm

Hemment_Mark/HEMEL_IOPS_INTERNATIONAL-SALES@uniplex.co.uk
Thu, 2 May 96 10:37:21 +0100


> Jon Burgess wrote:
> On Tue, 23 Apr 1996, David S. Miller wrote:
>> Date: Mon, 22 Apr 96 18:26 BST
>> From: Jamie Lokier <jamie@rebellion.co.uk>
>>
>> [good idea to generate page-faults to gather usage stats]
>>
>> [idea of adding "current-jiffies" to unmapped pages]
>>
>> Later,
>> David S. Miller
>> davem@caip.rutgers.edu

> Here is a more considered view of how this might be done, its the
> most efficient and easiest way I can think of.
>
> [big snip...]

> Swapping policy: Swap pages who have been unmapped the longest,
> by including with all pages a time reference, in jiffies, which
> as well as the status (normal, unmapped, swapped) tells us when
> it was swapped out, swapped in or last accessed (from unmapping) or when
> unmapped.
> i.e. the last time it changed status.

> How do we do the unmapping?
> - Surely we can use most of the current swapping code, to unmap but
> not swap out the processes access to a page of memory, generating a
> page fault on access, where upon we give the page back and update
> our last_accessed time.

> How often do we do the unmapping?
> Perhaps all the time we're idle, the only penalty is an increase in the
> process
> execution time as it slowly gets its pages back. However if this was
> happening for every page the CPU accessed this wouldn't be good.
> Implementing a mechanism to back off unmapping frequently used pages is
> hard.
> Maybe we sweep the page tables and unmap these pages, these will then
> have there timers updated if and when they are again accessed.
> (A fixed number per sweep) Only used pages will get re-mapped, so after
> a while
> we'll just be unmapping frequently used pages - we must be careful not
> to do too
> much, however my machine has a large idle time normally. Perhaps
> we should decrease the number of pages scanned for when CPU is low.

> Really we ought to get some computer scientist to give some ideas,
> aren't you
> meant to cover such algorithms in your courses? Has anyone got any
> references?
> Or knows how other OS's do it any better?

You are starting to get close to the "clock" algorithm (at least that's
what I know it as).

On some processors, you do not need to generate a page fault to be
able to detect if a page has been referenced. Within a PTE there
is a bit called the "reference-bit", which is set by the hardware
whenever the page is referenced. On other processors, mostly the
_true_ RISC ones (ie. not Intel), the reference bit needs to be simulated
in software by unmapping pages to force page-faults.

No matter how the reference bit is generated, it occassionally needs
to be tested, cleared, and something done with the pages. This is the
"clock".

I don't have a description of this algorithm with me, so this is from
memory (it's not very difficult).
The "clock" has two hands, which are really pointers to page info
structures. These structures are linked into a circular list.
The front-hand turns off the reference bit (and unmaps the page for
processors that need to simulate the bit). The back-hand (which trails
behind the front-hand by a number of pages) checks this bit. If the
bit is still off the page hasn't been referenced "recently". Dirty
"unreferenced" pages are placed at the end of a list to be written out
to disk, and are unmapped from their process(es) address space.
Clean pages are placed at the end of a "to-be-free-list", and are
also unmapped.

If a process needs these pages again, then it will page-fault. The
faulting algorithm checks these lists and reclaims the page it needs
(if it's still on one of these lists, of course).

When memory becomes low, a kernel process "pagedaemon" (I think) is
woken up. It first scans the list of "to-be-free-list", moving 'x'
pages from the head of the list into the kernel's free-list. By
taking from the head of the list, pages which have recently been added
are given a chance to be reclaimed by the process(es). If free memory
is becoming very low, then the dirty pages are written to disk and
moved to the end of the "to-be-free-list".

The "clock" is a "Not-Recently-Used" algorithm, and is not very
difficult to implement. It can run of the kernel's soft-interrupt.
The main difficultly is in "tuning";
How many pages to scan each time (ie. how far to move the
hands).

What is the distance between the front and back hands? If
small, pages are quickly removed from a process (although
still in memory, so can easily be reclaimed).

When should the pagedaemon be woken up? And how many pages
should it moved between the lists.

For processors which do not support the ref-bit in hardware, the
page-fault handler needs to be more efficient (as that is where the bit is
similated). For a given process, most page-faults follow the pattern
of being in the same vm area. Adding a caching-pointer to the vm
structure to hold the address of the last vm_struct which caused
a fault, would reduced the need to call find_vma() for each fault. If
I remember correctly, this is what OSF and SVR4 both do. (On a side
note, the same can be said for the inode page hash list, which can be
made circular with the ptr in hash table pointing to the last page
referenced for the list - not the first page in the list as is now done).

I can get you a full list of references tomorrow, but any Unix internals
book covers this subject. I think the best description is in the
"Magic Garden", can't remeber the author. I'm sure I also have a few
papers about page reclaiming around somewhere.

markhe