Re: [PATCH] Compound page overhaul

From: David Howells
Date: Mon Nov 22 2004 - 23:32:04 EST



William Lee Irwin III <wli@xxxxxxxxxxxxxx> wrote:

> > (1) A new bit flag PG_compound_slave has been added. This is used to mark
> > ...
> There are a lot of ways to do these things. Most of it is bitpacking
> and dodging assumptions in other code about various fields always being
> something or other they expect (e.g. bh's vs. page->private).

I want to avoid putting magic numbers in page->private. What goes in there
could be anything, as it's up to the filesystem.

Do you have any preferences? I'd prefer to use page->mapping, I think, except
that's used for the destructor.

I should probably make a set-destructor function for hugetlbfs to call.

> A generally innocuous rearrangement. Some explanation of the advantage
> of these new bitpacking and field arrangements over the current
> arrangement may be good to have.

The only differences are:

(1) PG_compound_slave now exists.

(2) I'm permitting the owner of the page to do do what it will with
page->private on the first page.

> > (3) __page_first() is now provided to find the first page of any page set
> > (even single page sets).
>
> I have to question the underscores.

The underscores can be dropped if they're not wanted.

> Also, there's a commonly-used term in the superpage literature, ``head of
> the superpage'', that may be more easily recognizable for readers familiar
> with that but not Linux specifics, but that's just nomenclature and not
> particularly pressing or any kind of requirement, just a non-Linux
> precedent.

I couldn't think of a good name for it, so I settled on it being the first
page.

How about page_head()?

> __GFP_COMP was introduced because several unusual drivers allocate
> higher-order pages and then move on to free fragments of them. There's
> a small danger some others may allocate higher-order pages and then
> treat each piece as a separate entity (particularly in the freeing pass).

I wasn't aware of that. Looking at the mm code, doing a fragmentary release
would cause bad_page() to be invoked. Presumably these drivers modify the
various struct pages involved directly to keep the allocator happy.

It would be better, I think, to provide a page splitter function. Thus
allowing pages to be cut in half, and then have the two halves made into the
equivalent allocated pages.

> Sweeping affected drivers to use a fragmenting primitive may help here.

Do you know which drivers?

> > (6) compound_page_order() is now available. This will indicate the order
> ...
> Possible, but it's likely a micro-optimization to cache the order in
> registers across function calls. The allocator is something of a ``hot
> path'' and small alterations can have noticeable effects.

Yes... but the order gets examined anyway in the free page checker, and the
second plus page structs get modified too, so I don't think it'll make much of
a difference. Plus the filesystem or driver that owned the page won't need to
keep track of the size, nor will it need to calculate it.

> > (7) Trailing spaces have been cleaned up on lines in page_alloc.c.
>
> I like this quite a bit. =)

(defun trim ()
"Delete trailing whitespace in buffer"
(interactive)
(save-excursion
(goto-char (point-min))
(replace-regexp "[ \t\r]+$" "")
(goto-char (point-max))
(skip-chars-backward "\n")
(if (not (eobp))
(delete-region (1+ (point)) (point-max)))))

> > (8) bootmem.c now has a separate path to release pages to the main
> > allocator
> ...
> Clearly it could merely scan the bitmap for the largest properly-sized,
> properly-aligned leading run of free bits beyond even that, though I
> wouldn't expect you to pursue that as it's far beyond the scope of the
> patch. I was hit up to deal with bootmem.c issues, and will be looking
> into that and more after the current set of bootmem changes has settled
> down and ia64 bootstrap has been stable for a while.

I may look at doing this after this patch (or similar) goes in. If so, I'll
send you the patch.

> (2) The physaddr alignment comment in bootmem.c is mangled. It's not
> O(LOG2(BITS_PER_LONG)) -aligned, it's exactly LOG2(BITS_PER_LONG)
> aligned. But we don't have a LOG2(...) macro, we have fls()/ffs().

I suspect that's meant to be mathematical notation, not strictly compilable
code, though I think there may be a missing "if" in it.

> (3) page_count() probably deserves the %0*lx treatment in __bad_page().
> Conserving screenspace when possible helps some, though that's
> offset a bit against predictable field alignment. Maybe putting
> variable-length fields at the end of the line would help.

I don't think that matters too much. This message should never be seen, after
all...

That said, I think I should probably provide a 64-bit version too... some of
the fields will have 16-char widths there.

> Also, the pfn would be great to have there, too, while you're at it.

Okay.

> (4) I wonder if anyone's run with CONFIG_DEBUG_PAGEALLOC recently.
> bootmem.c seems a bit early for kernel_map_pages() et al.
> It could be okay depending.

I can try it. BTW, should the free page checking be contingent on this option?
Or maybe it should have its own option.

> (5) This patch does a fair number of different things and it takes a
> bit of effort to wade through some of the longer rearrangements
> as they overflow 80x24. It would be helpful for reviewers if you
> could break this down into a somewhat more easily-digestible
> series of smaller patches.

It's a little tricky to break it up logically since it's mostly incredibly
interrelated.

I could separate out some of the cleanups: rearrangement between files,
trailing space splatting.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/