Re: 2.1.78: mm and networking questions...

kwrohrer@enteract.com
Thu, 8 Jan 1998 05:17:00 -0600 (CST)


And lo, Colin Plumb saith unto me:
> > (1) There's just one "struct page" per physical page? And there's
> > an array "mem_map" of these, indexed redundantly by
> > MAP_NR(address) and by struct page::map_nr?
> That is correct. It's there because it's fast to divide an address by pleasure
> computing "page-mem_map" involves a slow division by 52.
"pleasure computing"? Runaway spell checker or Freudian slip?
Also, mem_map[] is the *only* source of struct pages?

> > (2) What on earth is "mem_map_t" doing, and why should this alias for
> > "struct page" exist?
> No idea. I think it's confusing and a bad idea. If someone with more
> seniority would give me a hint that a patch to remove the typedef would be
> accepted, I'd make one in a second.
It doesn't appear often, and seems to imply a context (we have comments
to do that) of the struct page managing more than a single page. It's
used in mm/page_alloc.c and mm/memory.c (plus include/linux/mm.h) AFAgrepK.

> > (3) Would performance suffer horribly if the struct page were to have
> > a more even (14 or 16) number of words in it, or would we get
> > back performance by making the cache line boundaries fall in the
> > right places?
>
> Well, it would eat more memory if it were made larger, but yes, a
> multiple of 4 words would be a good thing. Actually, GCC optimizes
> division by 3 very well (it turns it into a multiply by 0x55555555,
> which it in fact does better than a normal multiply), so perhaps
> cacheing that value is a mistake.
>
> Cache effects are non-trivial and reducing a struct page to 12 words
> would make it line up much better.
If I didn't plan to add anything, age could become unsigned short,
as could flags AFAIK (only 11 bits are used), which would bring us
down to 12 longs worth on x86 and 68k at least... OTOH, someone could
also try padding it out to 16 words, benching before and after; losing
12/4096 of memory shouldn't be a major penalty on any but the smallest
machines...

> > (4) Similarly to (1) I take it there's exactly one struct mm_struct per
> > struct task_struct, and each of the struct vm_area_struct
> > *mmap points to a chain of vma's unique to the task?
> No. Threads share mm structures. (See kernel/fork.c, copy_mm() where
> it checks CLONE_VM). mm->count is a reference count (see mmget() in
> <linux/shed.h>).
As long as threads share the whole of the mm stuff, page tables included,
I'm not worried there; as long as there's exactly one struct vm_area_struct
per VM area per heavyweight process, I'll be fine...

> > (5) When we start to swap a page out to disk, if the process wants
> > to write to that page, what happens? I can't find anything
> > to prevent the access, nor can I find anything that would
> > notice such an access, until the disk I/O completes and the
> > page gets replaced or hits the swap cache...
> Um, the code in mm/vmscan.c:try_to_swap_out sure looks like it clears the
> TLB entry before swapping out. get_swap_page returns a TLB entry for
> a not-present page, which is installed into the TLB and then the swapout
> is done.
Ah, there it is. Knew it had to be somewhere, was looking in ll_rw_page,
but never followed things all the way down the maze of twisty little
functions, all different. (Not as bad-looking as the maze of little
different functions, all twisty, that is slab.c, but OTOH I didn't miss
important stuff in slab.c that I know of, yet.)

> Another valid alternative just sets the page clean before the swap out,
> and when the I/O completes, if it was dirtied, I guess that wasn't
> a real good page to swap out...
That's what I'd do if I had to write it from scratch.

> After this, you get out of my depth. I know that Linus has been resisting
> reverse page maps for a while, since a linked list through all the TLBs
> showing all the users of a given page doubles the size of the TLBs and
> causes all kinds of second-order performance problems.
TLB? I know I'll need to be able to do a struct page -> pte(s) lookup,
which would of course include any ptes cached in soft TLBs, but I was
hoping I could manage this linkage at, say, the vma level--after all,
that's part of what they're for, no? (Obviously, they can't go in with
the struct pages...) This may take a little arithmetic per vma involved
in the sharing,

Keith