Re: [PATCH 0/3] mm,thp,rmap: rework the use of subpages_mapcount

From: Linus Torvalds
Date: Mon Nov 21 2022 - 12:24:03 EST


On Mon, Nov 21, 2022 at 8:59 AM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote:
>
> Is there a plan to remove lock_page_memcg() altogether which I missed? I
> am planning to make lock_page_memcg() a nop for cgroup-v2 (as it shows
> up in the perf profile on exit path)

Yay. It seems I'm not the only one hating it.

> but if we are removing it then I should just wait.

Well, I think Johannes was saying that at least the case I disliked
(the rmap removal from the page table tear-down - I strongly suspect
it's the one you're seeing on your perf profile too) can be removed
entirely as long as it's done under the page table lock (which my
final version of the rmap delaying still was).

See

https://lore.kernel.org/all/Y2llcRiDLHc2kg%2FN@xxxxxxxxxxx/

for his preliminary patch.

That said, if you have some patch to make it a no-op for _other_
reasons, and could be done away with _entirely_ (not just for rmap),
then that would be even better. I am not a fan of that lock in
general, but in the teardown rmap path it's actively horrifying
because it is taken one page at a time. So it's taken a *lot*
(although you might not see it if all you run is long-running
benchmarks - it's mainly the "run lots of small scripts that really
hits it).

The reason it seems to be so horrifyingly noticeable on the exit path
is that the fork() side already does the rmap stuff (mainly
__page_dup_rmap()) _without_ having to do the lock_page_memcg() dance.

So I really hate that lock. It's completely inconsistent, and it all
feels very wrong. It seemed entirely pointless when I was looking at
the rmap removal path for a single page. The fact that both you and
Johannes seem to be more than ready to just remove it makes me much
happier, because I've never actually known the memcg code enough to do
anything about my simmering hatred.

Linus