[PATCH 1/1] NBD: fix I/O hang on disconnected nbds

From: Paul Clements
Date: Mon Feb 09 2009 - 13:22:38 EST


This patch fixes a problem that causes I/O to a disconnected
(or partially initialized) nbd device to hang indefinitely. To reproduce:

# ioctl NBD_SET_SIZE_BLOCKS /dev/nbd23 514048
# dd if=/dev/nbd23 of=/dev/null bs=4096 count=1

...hangs...

This can also occur when an nbd device loses its nbd-client/server
connection. Although we clear the queue of any outstanding I/Os after the client/server connection fails, any additional I/Os that get queued later will hang.

This bug may also be the problem reported in this bug report:
http://bugzilla.kernel.org/show_bug.cgi?id=12277

Testing would need to be performed to determine if the two issues are the same.

This problem was introduced by the new request handling thread code
("NBD: allow nbd to be used locally", 3/2008), which entered into mainline around 2.6.25.

The fix, which is fairly simple, is to restore the check for lo->sock
being NULL in do_nbd_request. This causes I/O to an uninitialized nbd to
immediately fail with an I/O error, as it did prior to the introduction of this bug.

--
Paul
This patch fixes a problem that causes I/O to a disconnected
(or partially initialized) nbd device to hang indefinitely. To reproduce:

# ioctl NBD_SET_SIZE_BLOCKS /dev/nbd23 514048
# dd if=/dev/nbd23 of=/dev/null bs=4096 count=1

...hangs...

This can also occur when an nbd device loses its nbd-client/server
connection. Although we clear the queue of any outstanding I/Os after the
client/server connection fails, any additional I/Os that get queued later
will hang.

This bug may also be the problem reported in this bug report:
http://bugzilla.kernel.org/show_bug.cgi?id=12277

Testing would need to be performed to determine if the two issues are the same.

This problem was introduced by the new request handling thread code
("NBD: allow nbd to be used locally", 3/2008), which entered into
mainline around 2.6.25.

The fix, which is fairly simple, is to restore the check for lo->sock
being NULL in do_nbd_request. This causes I/O to an uninitialized nbd to
immediately fail with an I/O error, as it did prior to the introduction of
this bug.

Signed-off-by: Paul Clements <paul.clements@xxxxxxxxxxxx>
---

nbd.c | 9 +++++++++
1 files changed, 9 insertions(+)

--- ./drivers/block/nbd.c.PRISTINE 2009-02-09 12:41:09.000000000 -0500
+++ ./drivers/block/nbd.c 2009-02-09 12:41:19.000000000 -0500
@@ -547,6 +547,15 @@ static void do_nbd_request(struct reques

BUG_ON(lo->magic != LO_MAGIC);

+ if (unlikely(!lo->sock)) {
+ printk(KERN_ERR "%s: Attempted send on closed socket\n",
+ lo->disk->disk_name);
+ req->errors++;
+ nbd_end_request(req);
+ spin_lock_irq(q->queue_lock);
+ continue;
+ }
+
spin_lock_irq(&lo->queue_lock);
list_add_tail(&req->queuelist, &lo->waiting_queue);
spin_unlock_irq(&lo->queue_lock);