Re: Crash in ide_do_request() on card removal

From: Jens Axboe
Date: Tue Aug 02 2005 - 05:50:18 EST


On Tue, Aug 02 2005, Steven Scholz wrote:
> Steven Scholz wrote:
>
> >Hi there,
> >
> >when surprisingly removing a CF ATA card (without unmounting before) I
> >sometimes get kernel crashes in ide_do_request() (linux-2.6.13-rc4 on ARM):
> >
> >cardmgr[194]: shutting down socket 0
> >cardmgr[194]: executing: './ide stop hda'
> >cardmgr[194]: + umount -v /dev/hda1
> >Assertion '(hwgroup->drive)' failed in
> >drivers/ide/ide-io.c:ide_do_request(1130)
> >Assertion '(drive)' failed in drivers/ide/ide-io.c:choose_drive(1035)
> >Unable to handle kernel NULL pointer dereference at virtual address
> >00000010
> >pgd = c0e34000
> >[00000010] *pgd=20eb0031, *pte=00000000, *ppte=00000000
> >Internal error: Oops: 17 [#1]
> >Modules linked in: ide_cs pcmcia at91_cf pcmcia_core
> >CPU: 0
> >PC is at ide_do_request+0x100/0x480
> >LR is at 0x1
> >pc : [<c00f9980>] lr : [<00000001>] Not tainted
> >...
> >
> >As the assertions show "drive" is NULL (due to the card removal?) and
> >thus the kernel crashes ...
> >
> >Upon card removal the pcmcia cardmgr tries to unmount the drive which
> >disapeared.
> >
> >("sometimes" above means that the rest of the time the kernel is not
> >dumping core, but the umount process hangs forever.)
>
> (I think) I found the reason for this behaviour:
>
> Upon card removal the functions
>
> ~ # cardctl eject
> ide_release(398)
> ide_unregister(585): index=0
> blk_unregister_queue(3603)
> elv_unregister_queue(549)
> ide_unregister(698)
> ide_detach(164)
>
> are called. Thus the request queue for the drive is discarded which is fair
> enough. But disk->queue would still point to a (now invalid)
> request_queue_t structure. Thus if I/O requests (e.g. "umount") are started
> _after_ the drive was removed bad things can happen! So I think we should
> explicitly remove the reference to that queue by doing
>
> void blk_unregister_queue(struct gendisk *disk)
> {
> request_queue_t *q = disk->queue;
>
> if (q && q->request_fn) {
> elv_unregister_queue(q);
> kobject_unregister(&q->kobj);
> + disk->queue = NULL;
> kobject_put(&disk->kobj);
> }
> }

That's not quite true, q is not invalid after this call. It will only be
invalid when it is freed (which doesn't happen from here but rather from
the blk_cleanup_queue() call when the reference count drops to 0).

This is still not perfect, but a lot better. Does it work for you?

--- linux-2.6.12/drivers/ide/ide-disk.c~ 2005-08-02 12:48:16.000000000 +0200
+++ linux-2.6.12/drivers/ide/ide-disk.c 2005-08-02 12:48:32.000000000 +0200
@@ -1054,6 +1054,7 @@
drive->driver_data = NULL;
drive->devfs_name[0] = '\0';
g->private_data = NULL;
+ g->disk = NULL;
put_disk(g);
kfree(idkp);
}

--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/