Re: [PATCH v4 1/7] mm/mremap: Optimize the start addresses in move_page_tables()

From: Lorenzo Stoakes
Date: Tue Jun 27 2023 - 14:02:39 EST


On Tue, Jun 27, 2023 at 01:56:09PM -0400, Liam R. Howlett wrote:
[snip]
> > > How about something like:-
> > >
> > > return find_vma_intersection(vma->mm, addr_masked, vma->vm_start) == NULL;
> > >
> > > Which explicitly asserts that the range in [addr_masked, vma->vm_start) is
> > > empty.
> > >
> > > But actually, we should be able to go further and replace the previous
> > > check with:-
> > >
> > > return find_vma_intersection(vma->mm, addr_masked, addr_to_align) == NULL;
> > >
> > > Which will fail if addr_to_align is offset within the VMA.
> >
> > Your suggestion would mean that we do a full VMA search starting from the
> > root. That would not be a nice thing if say we've 1000s of VMAs?
> >
> > Actually Liam told me to use find_vma_prev() because given a VMA, the maple
> > tree would not have to work that hard for the common case to find the
> > previous VMA. Per conversing with him, there is a chance we may have to go
> > one step above in the tree if we hit the edge of a node, but that's not
> > supposed to be the common case. In previous code, the previous VMA could
> > just be obtained using the "previous VMA" pointer, however that pointer has
> > been remove since the maple tree changes and given a VMA, going to the
> > previous one using the maple tree is just as fast (as I'm told).
>
> I think there's been a bit of a miscommunication on that..
>
> If you have already found the VMA and are using the maple state, then
> it's very little effort to get the next/prev. Leaf nodes can hold 16
> entries/NULL ranges, so the chances to go to the next/prev is usually in
> the cpu cache already.. if you go up a level in the tree, then you will
> have 10 nodes each with 16 entries each, etc, etc.. So the chances of
> being on an edge node and having to walk up multiple levels to get to
> the prev/next becomes rather rare.. and if you've just walked down, the
> nodes on the way up will still be cached.
>
> Here, you're not using the maple state but searching for an address
> using find_vma_prev(), but internally, that function does use a maple
> state to get you the previous. So you are looking up the VMA from the
> root, but the prev will very likely be in the CPU cache.
>
> Assuming the worst case tree (each VMA has a gap next to it, not really
> going to happen as they tend to be grouped together), then we are
> looking at a 4 level tree to get to 8,000 VMAs. 5 levels gets you a
> minimum 80,000. I've never seen a tree of height 6 in the wild, but you
> can fit 1.6M to 800K in one.
>
> I think the code is fine, but I wanted to clarify what we discussed.

Would the same apply to find_vma_intersection(), as they equally searches
from the root and allows the code to be made fairly succinct?

I really am not a huge fan of find_vma_prev() searching for a VMA you
already have just to get the previous one... would at lesat like to use
vma_prev() on a newly defined vmi, but if find_vma_intersection() is fine
then can reduce code to this.

[snip]