Re: [PATCH 00/22] Don't use kmalloc() with GFP_DMA

From: Baoquan He
Date: Tue Feb 22 2022 - 08:27:06 EST


On 02/22/22 at 09:12pm, Baoquan He wrote:
> On 02/22/22 at 09:44am, Christoph Hellwig wrote:
> > On Mon, Feb 21, 2022 at 02:57:34PM +0100, Heiko Carstens wrote:
> > > > 1) Kmalloc(GFP_DMA) in s390 platform, under arch/s390 and drivers/s390;
> > >
> > > So, s390 partially requires GFP_DMA allocations for memory areas which
> > > are required by the hardware to be below 2GB. There is not necessarily
> > > a device associated when this is required. E.g. some legacy "diagnose"
> > > calls require buffers to be below 2GB.
> > >
> > > How should something like this be handled? I'd guess that the
> > > dma_alloc API is not the right thing to use in such cases. Of course
> > > we could say, let's waste memory and use full pages instead, however
> > > I'm not sure this is a good idea.
> >
> > Yeah, I don't think the DMA API is the right thing for that. This
> > is one of the very rare cases where a raw allocation makes sense.
> >
> > That being said being able to drop kmalloc support for GFP_DMA would
> > be really useful. How much memory would we waste if switching to the
> > page allocator?
> >
> > > s390 drivers could probably converted to dma_alloc API, even though
> > > that would cause quite some code churn.
> >
> > I think that would be a very good thing to have.
> >
> > > > For this first patch series, thanks to Hyeonggon for helping
> > > > reviewing and great suggestions on patch improving. We will work
> > > > together to continue the next steps of work.
> > > >
> > > > Any comment, thought, or suggestoin is welcome and appreciated,
> > > > including but not limited to:
> > > > 1) whether we should remove dma-kmalloc support in kernel();
> > >
> > > The question is: what would this buy us? As stated above I'd assume
> > > this comes with quite some code churn, so there should be a good
> > > reason to do this.
> >
> > There is two steps here. One is to remove GFP_DMA support from
> > kmalloc, which would help to cleanup the slab allocator(s) very nicely,
> > as at that point it can stop to be zone aware entirely.
> >
> > The long term goal is to remove ZONE_DMA entirely at least for
> > architectures that only use the small 16MB ISA-style one. It can
> > then be replaced with for example a CMA area and fall into a movable
> > zone. I'd have to prototype this first and see how it applies to the
> > s390 case. It might not be worth it and maybe we should replace
> > ZONE_DMA and ZONE_DMA32 with a ZONE_LIMITED for those use cases as
> > the amount covered tends to not be totally out of line for what we
> > built the zone infrastructure.
> >
> > > >From this cover letter I only get that there was a problem with kdump
> > > on x86, and this has been fixed. So why this extra effort?
> > >
> > > > 3) Drop support for allocating DMA memory from slab allocator
> > > > (as Christoph Hellwig said) and convert them to use DMA32
> > > > and see what happens
> > >
> > > Can you please clarify what "convert to DMA32" means? I would assume
> > > this does _not_ mean that passing GFP_DMA32 to slab allocator would
> > > work then?
> >
> > I'm really not sure what this means.
>
> Thanks a lot to Heiko for valuable input, it's very helpful. And thanks
> a lot to Christoph for explaining.
>
> I guess this "convert to DMA32" is similar to "replace ZONE_DMA and
> ZONE_DMA32 with a ZONE_LIMITED".

And by the way, when I searched SLAB_CACHE_DMA32 which is another zone
aware slab flag, I got that not all people likes to abuse
kmalloc(GFP_DMA). There are two places where
kmem_cache_create(SLAB_CACHE_DMA32) are called to create slab grabbing
memory from zone DMA32. Obviously the code author really knows slab
allocator. They use dma32 slab to get cache memory under 4G.

drivers/firmware/google/gsmi.c : gsmi_init()
drivers/iommu/io-pgtable-arm-v7s.c: arm_v7s_alloc_pgtable()

>
> When I use 'git grep "GFP_DMA/>"' to search all places specifying GFP_DMA,
> I noticed the main usage of kmalloc(GFP_DMA) is to get memory under a
> memory limitation, but not for DMA buffer allocation. Below is what I got
> for earlier kdump issue explanation. It can help explain why kmalloc(GFP_DMA)
> is useful on ARCHes w/o ZONE_DMA32, but doesn't make sense on x86_64 which
> has both zone DMA and DMA32. The 16M ZONE_DMA is only for very rarely used
> legacy ISA device, but most pci devices driver supporting 32bit addressing
> likes to abuse kmalloc(GFP_DMA) to get DMA buffer from the zone DMA.
> That obviously is unsafe and unreasonable.
>
> Like risc-V which doesn't have the burden of legacy ISA devices, it can
> take only containing DMA32 zone way. ARM64 also adjusts to have only
> arm64 if not on Raspberry Pi. Using kmalloc(GFP_DMA) makes them no
> inconvenience. If finally having dma32-kmalloc, the name may need be
> carefully considerred, it seems to be acceptable. We just need to pick
> up those ISA device driver and handle their 24bit addressing DMA well.
>
> For this patchset, I only find out places in which GPF_DMA is
> redundant and can be removed directly, and places where
> kmalloc(GFP_DMA)|dma_map_ pair can be replaced with dma_alloc_xxxx() API
> and the memory wasting is not so big. I have patches converting
> kmalloc(GFP_DMA) to alloc_pages(GFP_DMA), but not easy to replace with
> dma_alloc_xxx(), Hyeonggon suggested not adding them to this series.
> I will continue investigating the left places, see whether or how we can
> convert them.
>
> =============================
> ARCH which has DMA32
> ZONE_DMA ZONE_DMA32
> arm64 0~X X~4G (X is got from ACPI or DT. Otherwise it's 4G by default, DMA32 is empty)
> ia64 None 0~4G
> mips 0 or 0~16M X~4G (zone DMA is empty on SGI_IP22 or SGI_IP28, otherwise 16M by default like i386)
> riscv None 0~4G
> x86_64 16M 16M~4G
>
>
> =============================
> ARCH which has no DMA32
> ZONE_DMA
> alpha 0~16M or empty if IOMMU enabled
> arm 0~X (X is reported by fdt, 4G by default)
> m68k 0~total memory
> microblaze 0~total low memory
> powerpc 0~2G
> s390 0~2G
> sparc 0~ total low memory
> i386 0~16M
>
> >
> > >
> > > btw. there are actually two kmalloc allocations which pass GFP_DMA32;
> > > I guess this is broken(?):
> > >
> > > drivers/hid/intel-ish-hid/ishtp-fw-loader.c: dma_buf = kmalloc(payload_max_size, GFP_KERNEL | GFP_DMA32);
> > > drivers/media/test-drivers/vivid/vivid-osd.c: dev->video_vbase = kzalloc(dev->video_buffer_size, GFP_KERNEL | GFP_DMA32);
> >
> > Yes, this is completely broken.
> >
>