Re: [PATCH v2 4/7] iov_iter: new iov_iter_pin_pages*() routines

From: Al Viro
Date: Mon Sep 26 2022 - 15:55:42 EST


On Mon, Sep 26, 2022 at 08:53:43AM -0700, Christoph Hellwig wrote:
> On Fri, Sep 23, 2022 at 05:13:42PM +0100, Al Viro wrote:
> > You are mixing two issues here - holding references to pages while using
> > iov_iter instance is obvious; holding them until async IO is complete, even
> > though struct iov_iter might be long gone by that point is a different
> > story.
>
> But someone needs to hold a refernce until the I/O is completed, because
> the I/O obviously needs the pages. Yes, we could say the callers holds
> them and can drop the references right after I/O submission, while
> the method needs to grab another reference. But that is more
> complicated and is more costly than just holding the damn reference.

Take a look at __nfs_create_request(). And trace the call chains leading
to nfs_clear_request() where the corresponding put_page() happens.

What I'm afraid of is something similar in the bowels of some RDMA driver.
With upper layers shoving page references into sglist using iov_iter_get_pages(),
then passing sglist to some intermediate layer, then *that* getting passed down
into a driver which grabs references for its own use and releases them from
destructor of some private structure. Done via kref_put(). Have that
delayed by, hell - anything, up to and including debugfs shite somewhere
in the same driver, iterating through those private structures, grabbing
a reference to do some pretty-print into kmalloc'ed buffer, then drooping it.
Voila - we have page refs duplicated from ITER_BVEC and occasionally staying
around after the ->ki_complete() of async ->write_iter() that got that
ITER_BVEC.

It's really not a trivial rule change.