Proposal to fix PGD/PMD lazy deallocation

Perry Harrington (pedward@apsoft.com)
Sat, 6 Jun 1998 23:17:01 -0700 (PDT)


I exchanged a couple of emails with Alan today, he enlightened me
to a few things. I have an idea to fix the lazy deallocation of
page tables.

The idea involves 2 changes:

First, we'd add a pointer to the task_struct and allocate a page
of RAM upon task creation. The page of RAM would be a table
which retained usage counts for PGD's, making them a fast cinch
to clean up on the x86 architecture.

Since Alphas and other 64 bit archs have a PMD that isn't folded,
you have to implement a check-on-zap loop that would check and
see if a PMD is empty when zap is called AND:

if (address < end) {
free the PMD
} else {
for (i=0; i<PTRS_PER_PMD; i++) {
if (pmd[i]) goto cont;
}
free the PMD

cont:
}

The above frees a PMD if the page boundary is guaranteed to change (address
is less than end, address is += size of RAM PMD can track). The else gets
small mappings or the remnants of mappings.

You would only have the above for 3 tier architectures, but it's justifiable:

A 3 tier arch can represent more data per a table
Because of the above, less PMD checks would be needed
3 tier processors typically have more horsepower because they have instructions
which would optimize the ugliness above.

Perhaps that justification isn't acceptable to some. The PGD tracking solution
is much more elegant and addresses the 90th percentile of Linux users and
addresses the x86 arch which typically has less RAM per machine, hence it's
more valuable.

Anyhow, each time we allocate a PTE within a PGD, we increment the usage
count at pgd_usage[pgd pointer >> SOME_SHIFT_CONSTANT]. When we free
a pgd, we decrement it. We check the count to see if it's 0 in zap_page_range
and free the dir if it is. On the x86, we can only have 768 PGD pages, so
on page can represent all their usage counts. On the 64 bit archs, we
may not be able to represent all the PGDs in on page without bit
shifting/encoding, please comment if you can answer that.

So, I can real easily solve the x86 case, the 3 tier case involves a magnitude
more data and therefore may be better handled in an arch dependent way or
via my default algorithm.

Please comment, as I'm very interested in solving this issue.

Thanks,

--Perry

-- 
Perry Harrington       Linux rules all OSes.    APSoft      ()
email: perry@apsoft.com 			Think Blue. /\

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu