Re: [PATCH v4 00/15] Fast kernel headers: split linux/mm.h

From: David Hildenbrand
Date: Tue Mar 12 2024 - 15:26:22 EST


On 12.03.24 19:11, Max Kellermann wrote:
On Tue, Mar 12, 2024 at 5:33 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
Just curious: why? Usually build time, do you have some numbers?

(This has been discussed so often and I thought having smaller header
dependencies was already established as general consensus among kernel
devs, and everybody agreed that mm.h and kernel.h are a big mess that
needs cleanup.)

It's always a good idea to include at least some sentences about history+motivation in the cover letter.


Why: Correctness, less namespace pollution, and the least important
aspect is reduced build times. However, the gains by this patch set
are very small; each optimization gains very little.

Some time ago, I posted a much larger patch set with a few numbers:
https://lore.kernel.org/lkml/20240131145008.1345531-1-max.kellermann@xxxxxxxxx/
- the speedup was measurable, but not amazing, because even after that
patch set, everybody still includes everything, and much more cleanup
is needed to make a bigger difference. Once the big knots like
kernel.h and mm.h are broken up, we will have better results. And as I
said: build times are nice, but the lesser advantage.

Okay, so "Fast kernel headers:" is a bit misleading :P

I'm asking these stupid questions because I remember some header splitup (from Christoph?) that really did make a significant build-time difference.

Reading all that, I would have written something like this in the cover letter:

"There is an agreement that we want to split up some of the big kernel headers, such as kernel.h and mm.h. While not delivering immediate build-time gains, it's one step into the desired direction. Besides build-time, smaller headers promise less namespace pollution and correctness."

(although I don't immediately understand what correctness means in this context)


Anyway, this first patch set was so extremely large that nobody was
able to review it. So this patch contains just the parts that deal
with mm.h; later, when this patch set is merged, I can continue with
other headers. (I already have a branch that splits kernel.h and I'll
submit it eventually, after the mm.h cleanup.)

I'd have continued with

"Some of these changes were part of a previous, bigger patchset [1]. I'm now sending out smaller, reviewable pieces. Splitting up mm.h is the first step."

[1] https://lore.kernel.org/lkml/20240131145008.1345531-1-max.kellermann@xxxxxxxxx/



I'm not against splitting out stuff. But one function per header is a
bit excessive IMHO.

One function per header is certainly not my goal and I agree it's
excessive; but folio_next() in its own header made sense, just in this
special case, because it allowed removing the mm.h dependency from
bio.h. Removing this dependency was the goal, and folio_next.h was the
solution for this particular problem.

Okay, I see. I still wonder if merging some of the new folio headers in "folio.h" wouldn't have achieved something close to that. Sure, they bring in some other dependencies, but hopefully not mm.h

Anyhow, what you are describing here would also have been reasonable to describe somewhere in few words. IOW, which approach did you chose to decide when a new header was reasonable.


Ideally, we'd have some MM guideline that we'll be
able to follow in the future. So we don't need "personal taste".

Agree. But lacking such a guideline, all I can do is make suggestions
and submit patches for review, trying to follow what seemed to be
consensus in previous cleanups and what had previously been merged.

Possibly it makes sense to have some rough guideline. The problem I see is that once your stuff is merged, it will start getting messed up again over time. But maybe that's just the way it works.


For example, if I were to write a folio_prev(), should I put it in
include/linux/mm/folio_prev.h ? Likely we'd put it into folio_next.h,
but then the name doesn't make any sense.

True. But since no folio_prev() function exists, the only name that
made sense for this header was folio_next.h. If folio_prev() gets
added, I'd say put it in that header, but rename it to, let's say,
"folio_iterator.h". But right now, with just this one function (and
nothing else like it that could be added here), I decided to suggest
naming it "folio_next.h". If you think "folio_iterator.h" or something
else is better, I'll rename and resubmit; names don't matter so much
to me; the general idea of cleaning up the headers is what's
important.

The point I am trying to make: if there was a single folio_ops.h, it
would be clearer where it would go.

Maybe, but IMO this wouldn't be a good place to add folio_next(),
because folio_next() doesn't "operate" on a folio, but on a folio
pointer (or "iterator"). Again, names don't matter to me -
ideas/concepts matter. I'm only explaining why I decided to submit my
patches that way, but I'll change them any way you people want.

Thanks for the details.

--
Cheers,

David / dhildenb