Re: [PATCH 0/5] mm/vmalloc.c: improve readability and rewrite vmap_area

From: oddtux
Date: Tue Jul 02 2019 - 08:18:17 EST


Uladzislau Rezki <urezki@xxxxxxxxx> ä2019å7æ1æåä äå6:11åéï
>
> On Mon, Jul 01, 2019 at 11:20:37AM +0200, Michal Hocko wrote:
> > On Sun 30-06-19 15:56:45, Pengfei Li wrote:
> > > Hi,
> > >
> > > This series of patches is to reduce the size of struct vmap_area.
> > >
> > > Since the members of struct vmap_area are not being used at the same time,
> > > it is possible to reduce its size by placing several members that are not
> > > used at the same time in a union.
> > >
> > > The first 4 patches did some preparatory work for this and improved
> > > readability.
> > >
> > > The fifth patch is the main patch, it did the work of rewriting vmap_area.
> > >
> > > More details can be obtained from the commit message.
> >
> > None of the commit messages talk about the motivation. Why do we want to
> > add quite some code to achieve this? How much do we save? This all
> > should be a part of the cover letter.
> >
> > > Thanks,
> > >
> > > Pengfei
> > >
> > > Pengfei Li (5):
> > > mm/vmalloc.c: Introduce a wrapper function of insert_vmap_area()
> > > mm/vmalloc.c: Introduce a wrapper function of
> > > insert_vmap_area_augment()
> > > mm/vmalloc.c: Rename function __find_vmap_area() for readability
> > > mm/vmalloc.c: Modify function merge_or_add_vmap_area() for readability
> > > mm/vmalloc.c: Rewrite struct vmap_area to reduce its size
> > >
> > > include/linux/vmalloc.h | 28 +++++---
> > > mm/vmalloc.c | 144 +++++++++++++++++++++++++++-------------
> > > 2 files changed, 117 insertions(+), 55 deletions(-)
> > >
> > > --
> > > 2.21.0
>
> > > Pengfei Li (5):
> > > mm/vmalloc.c: Introduce a wrapper function of insert_vmap_area()
> > > mm/vmalloc.c: Introduce a wrapper function of
> > > insert_vmap_area_augment()
> > > mm/vmalloc.c: Rename function __find_vmap_area() for readability
> > > mm/vmalloc.c: Modify function merge_or_add_vmap_area() for readability
> > > mm/vmalloc.c: Rewrite struct vmap_area to reduce its size
> Fitting vmap_area to 1 cacheline boundary makes sense to me. I was thinking about
> that and i have patches in my pipeline to send out but implementation is different.
>
> I had a look at all 5 patches. What you are doing is reasonable to me, i mean when
> it comes to the idea of reducing the size to L1 cache line.
>

Thank you for your review.

> I have a concern about implementation and all logic around when we can use va_start
> and when it is something else. It is not optimal at least to me, from performance point
> of view and complexity. All hot paths and tree traversal are affected by that.
>
> For example running the vmalloc test driver against this series shows the following
> delta:
>
> <5.2.0-rc6+>
> Summary: fix_size_alloc_test passed: loops: 1000000 avg: 969370 usec
> Summary: full_fit_alloc_test passed: loops: 1000000 avg: 989619 usec
> Summary: long_busy_list_alloc_test loops: 1000000 avg: 12895813 usec
> <5.2.0-rc6+>
>
> <this series>
> Summary: fix_size_alloc_test passed: loops: 1000000 avg: 1098372 usec
> Summary: full_fit_alloc_test passed: loops: 1000000 avg: 1167260 usec
> Summary: long_busy_list_alloc_test passed: loops: 1000000 avg: 12934286 usec
> <this series>
>
> For example, the degrade in second test is ~15%.
>
> --
> Vlad Rezki

Hi, Vlad

I think the reason for the performance degradation is that the value
of va_start is obtained by va->vm->addr.

And since the vmap area in the BUSY tree is always page-aligned,
there is no reason for _va_vmlid to override va_start, just let
the va->flags use the bits that lower than PAGE_OFFSET.

I will use this implementation in the next version and show almost
no performance penalty in my local tests.

I will send the next version soon.

Thank you for taking your time for the review.

Best regards,

Pengfei