Re: 2.1.78: mm and networking questions...

Colin Plumb (colin@nyx.net)
Thu, 8 Jan 1998 03:34:36 -0700 (MST)


> (1) There's just one "struct page" per physical page? And there's
> an array "mem_map" of these, indexed redundantly by
> MAP_NR(address) and by struct page::map_nr?

That is correct. It's there because it's fast to divide an address by pleasure
computing "page-mem_map" involves a slow division by 52.

> (2) What on earth is "mem_map_t" doing, and why should this alias for
> "struct page" exist?

No idea. I think it's confusing and a bad idea. If someone with more
seniority would give me a hint that a patch to remove the typedef would be
accepted, I'd make one in a second.

> (3) Would performance suffer horribly if the struct page were to have
> a more even (14 or 16) number of words in it, or would we get
> back performance by making the cache line boundaries fall in the
> right places?

Well, it would eat more memory if it were made larger, but yes, a
multiple of 4 words would be a good thing. Actually, GCC optimizes
division by 3 very well (it turns it into a multiply by 0x55555555,
which it in fact does better than a normal multiply), so perhaps
cacheing that value is a mistake.

Cache effects are non-trivial and reducing a struct page to 12 words
would make it line up much better.

> (4) Similarly to (1) I take it there's exactly one struct mm_struct per
> struct task_struct, and each of the struct vm_area_struct
> *mmap points to a chain of vma's unique to the task?

No. Threads share mm structures. (See kernel/fork.c, copy_mm() where
it checks CLONE_VM). mm->count is a reference count (see mmget() in
<linux/shed.h>).

> (5) When we start to swap a page out to disk, if the process wants
> to write to that page, what happens? I can't find anything
> to prevent the access, nor can I find anything that would
> notice such an access, until the disk I/O completes and the
> page gets replaced or hits the swap cache...

Um, the code in mm/vmscan.c:try_to_swap_out sure looks like it clears the
TLB entry before swapping out. get_swap_page returns a TLB entry for
a not-present page, which is installed into the TLB and then the swapout
is done.

Another valid alternative just sets the page clean before the swap out,
and when the I/O completes, if it was dirtied, I guess that wasn't
a real good page to swap out...

After this, you get out of my depth. I know that Linus has been resisting
reverse page maps for a while, since a linked list through all the TLBs
showing all the users of a given page doubles the size of the TLBs and
causes all kinds of second-order performance problems.

-- 
	-Colin