Re: [PATCH -next] mm: delete oversized WARN_ON() in kvmalloc() calls

From: Leon Romanovsky
Date: Thu Dec 02 2021 - 11:08:49 EST


On Thu, Dec 02, 2021 at 03:29:47PM +0000, Matthew Wilcox wrote:
> On Thu, Dec 02, 2021 at 05:23:42PM +0200, Leon Romanovsky wrote:
> > The problem is that this WARN_ON() is triggered by the users.
>
> ... or the problem is that you don't do a sanity check between the user
> and the MM system. I mean, that's what this conversation is about --
> is it a bug to be asking for this much memory in the first place?

We do a lot of checks, and in this case, user provided valid input.
He asked size that doesn't cross his address space.
https://elixir.bootlin.com/linux/v5.16-rc3/source/drivers/infiniband/core/umem_odp.c#L67

start = ALIGN_DOWN(umem_odp->umem.address, page_size);
if (check_add_overflow(umem_odp->umem.address,
(unsigned long)umem_odp->umem.length,
&end))
return -EOVERFLOW;

There is a feature called ODP (on-demand-paging) which is supported
in some RDMA NICs. It allows to the user "export" their whole address
space to the other RDMA node without pinning the pages. And once the
other node sends data to not-pinned page, the RDMA NIC will prefetch
it.

>
> > At least in the RDMA world, users can provide huge sizes and they expect
> > to get plain -ENOMEM and not dump stack, because it happens indirectly
> > to them.
> >
> > In our case, these two kvcalloc() generates WARN_ON().
> >
> > umem_odp->pfn_list = kvcalloc(
> > npfns, sizeof(*umem_odp->pfn_list), GFP_KERNEL);
>
> Does it really make sense for the user to specify 2^31 PFNs in a single
> call? I mean, that's 8TB of memory. Should RDMA put its own limit
> in here, or should it rely on kvmalloc returning -ENOMEM?

I heard about such systems with so many available RAM. I don't know
about their usage pattern, most likely they will use hugepages, but
it is my guess.

The thing is that it is not RDMA-specific. You are asking to place same
if (size > KVMALLOC...) check in all subsystems.

Thanks