Re: [PATCH RFC 3/3] mm/vmalloc.c: allow vread() to read out vm_map_ram areas

From: Baoquan He
Date: Tue Nov 22 2022 - 22:40:25 EST


On 11/18/22 at 08:01am, Matthew Wilcox wrote:
> On Wed, Nov 09, 2022 at 11:35:34AM +0800, Baoquan He wrote:
> > Currently, vread() can read out vmalloc areas which is associated with
> > a vm_struct. While this doesn't work for areas created by vm_map_ram()
> > interface because it doesn't allocate a vm_struct. Then in vread(),
> > these areas will be skipped.
> >
> > Here, add a new function vb_vread() to read out areas managed by
> > vmap_block specifically. Then recognize vm_map_ram areas via vmap->flags
> > and handle them respectively.
>
> i don't understand how this deals with the original problem identified,
> that the vread() can race with an unmap.

Thanks for checking.

I wrote a paragraph, then realized I misunderstood your concern. You are
saying the comment from Uladzislau about my original draft patch, right?
Paste the link of Uladzislau's reply here in case other people want to
know the background:
https://lore.kernel.org/all/Y1uKSmgURNEa3nQu@pc636/T/#u

When Stephen raised the issue originally, I posted a draft patch as
below trying to fix it:
https://lore.kernel.org/all/Y1pHTj2wuhoWmeV3@MiWiFi-R3L-srv/T/#u

In above draft patch, I tried to differentiate normal vmalloc area and
vm_map_ram area with the fact that vmalloc area is associated with a
vm_struct, while vm_map_ram area has ->vm as NULL. And I thought their
only difference is normal vmalloc area has guard page, so its size need
consider the guard page; while vm_map_ram area has no guard page, only
consider its own actual size. Uladzislau's comment reminded me I was
wrong. And the things we need handle are beyond that.

Currently there are three kinds of vmalloc areas in kernel:

1) normal vmalloc areas, associated with a vm_struct, this is allocated
in __get_vm_area_node(). When freeing, it set ->vm to NULL
firstly, then unmap and free vmap_area, see remove_vm_area().

2) areas allocated via vm_map_ram() and size is larger than
VMAP_MAX_ALLOC. The entire area is not associated with vm_struct, and
freed at one time in vm_unmap_ram() with unmapping and freeing vmap_area;

3) areas allocated via vm_map_ram(), then delegate to vb_alloc() when
size <= VMAP_MAX_ALLOC. Its vmap_area is allocated at one time with
VMAP_BLOCK_SIZE big, and split and used later through vb_alloc(), freed
via vb_free(). When the entire area is dirty, it will be unmapped and
freed.

Based on above facts, we need add flags to differentiate the normal
vmalloc area from the vm_map_ram area, namely area 1) and 2). And we
also need flags to differentiate the area 2) and 3). Because area 3) are
pieces of a entire vmap_area, vb_free() will unmap the piece of area and
set the part dirty, but the entire vmap_area will kept there. So when we
will read area 3), we need take vb->lock and only read out the still
mapped part, but not dirty or free part of the vmap_area.

Thanks
Baoquan