[PATCH] aio: prevent the final fput() in the middle of vfs_poll() (Re: KASAN: use-after-free Read in unix_dgram_poll)

From: Al Viro
Date: Sun Mar 03 2019 - 10:19:03 EST


On Sun, Mar 03, 2019 at 01:55:02PM +0000, Al Viro wrote:

> Maybe unrelated to this bug, but... What's to prevent a wakeup
> that happens just after we'd been added to a waitqueue by ->poll()
> triggering aio_poll_wake(), which gets to aio_poll_complete()
> with its fput() *before* we'd reached the end of ->poll() instance?
>
> I wonder if adding
> get_file(req->file); // make sure that early completion is safe
> right after
> req->file = fget(iocb->aio_fildes);
> if (unlikely(!req->file))
> return -EBADF;
> paired with
> fput(req->file);
> right after out: in aio_poll() is needed... Am I missing something
> obvious here? Christoph?

In more details - normally IOCB_CMD_POLL handling looks so:

1) io_submit(2) allocates aio_kiocb instance and passes it to aio_poll()
2) aio_poll() resolves the descriptor to struct file by
req->file = fget(iocb->aio_fildes)
3) aio_poll() sets ->woken to false and raises ->ki_refcnt of that
aio_kiocb to 2 (bumps by 1, that is).
4) aio_poll() calls vfs_poll(). After sanity checks (basically, "poll_wait()
had been called and only once") it locks the queue. That's what the extra
reference to iocb had been for - we know we can safely access it.
5) With queue locked, we check if ->woken has already been set to true
(by aio_poll_wake()) and, if it had been, we unlock the queue, drop
a reference to aio_kiocb and bugger off - at that point it's
a responsibility to aio_poll_wake() and the stuff called/scheduled by
it. That code will drop the reference to file in req->file, along
with the other reference to our aio_kiocb.
6) otherwise, we see whether we need to wait. If we do, we unlock
the queue, drop one reference to aio_kiocb and go away - eventual
wakeup (or cancel) will deal with the reference to file and with the
other reference to aio_kiocb
7) otherwise we remove ourselves from waitqueue (still under the queue
lock), so that wakeup won't get us. No async activity will be happening,
so we can safely drop req->file and iocb ourselves.

If wakeup happens while we are in vfs_poll(), we are fine - aio_kiocb
won't get freed under us, so we can do all the checks and locking safely.
And we don't touch ->file if we detect that case.

However, vfs_poll() most certainly *does* touch the file it had been
given. So wakeup coming while we are still in ->poll() might end up
doing fput() on that file. That case is not too rare, and usually
we are saved by the still present reference from descriptor table -
that fput() is not the final one. But if another thread closes that
descriptor right after our fget() and wakeup does happen before
->poll() returns, we are in trouble - final fput() done while we
are in the middle of a method.

What we need is to hold on to the file reference the same way we do with
aio_kiocb. No need to store the reference to what we'd got in a separate
variable - req->file is never reassigned and we'd already made sure that
req won't be freed under us, so dropping the extra reference with
fput(req->file) is fine in all cases.

Fixes: bfe4037e722ec
Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
---
diff --git a/fs/aio.c b/fs/aio.c
index 3083180a54c8..7e88bfabdac2 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1767,6 +1767,7 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)

/* one for removal from waitqueue, one for this function */
refcount_set(&aiocb->ki_refcnt, 2);
+ get_file(req->file);

mask = vfs_poll(req->file, &apt.pt) & req->events;
if (unlikely(!req->head)) {
@@ -1793,6 +1794,7 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
spin_unlock_irq(&ctx->ctx_lock);

out:
+ fput(req->file);
if (unlikely(apt.error)) {
fput(req->file);
return apt.error;